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INTRODUCTION 


During  a  two  year  grant  period  several  research  tasks  have  been  completed 
in  a  continuation  of  the  long  range  objectives  related  to  the  development  of 
numerical  methods  for  complex  flow  fields.  The  major  emphasis  has  been  the 
background  concept  of  algorithms  which  recognize  the  extent  to  which  the  pro¬ 
cedure  matches  the  physical  description  being  evaluated.  This  has  involved 
extensions  of  our  adaptive  techniques  to  include  finite  element  and  Navier- 
Stokes  components,  as  well  as  peripheral  study  of  analytical  and  machine 
architecture  possibilities  for  achieving  accelerated  convergence. 

The  physical  impetus  is  the  nature  of  fluid  fields  such  that  individual 
events  often  occur  in  relative  isolation  and  therefore  with  a  scale  unique 
to  the  local  physical  feature.  However,  overlaps  and  interactions  between 
features  do  occur  and  are  of  some  importance.  The  numerical  stimulus  is  the 
advantage  that  results  from  avoiding  procedures  carried  out  globally  when  a 
specific  aspect  of  the  discrete  methodology  is  present  solely  to  meet  a 
local  need.  The  research  effort  has  focused  on  linking  the  two  drives  by 
way  of  unstructured  grid  and  descriptive  equation  modifications  that  are 
introduced  as  the  numerical  integration  is  carried  out  and  only  where  required 
in  a  space/time  domain. 

In  a  broader  sense  this  amounts  to  achieving  robust  integration  while 
ensuring  computational  efficiencies  in  both  storage  and  time,  and  thus 
attention  has  also  been  given  to  boundary  condition  and  parallel  processing 
questions  which  are  not  limited  to  an  adaptive  approach. 

Specific  efforts  have  been  devoted  to  embedded  adaptation  using  finite 
element  methods  for  both  two  and  three  dimensional  steady  state  Euler  systems 
[6,  11,  13],  as  well  as  finite  difference  methods  for  adaptive  Navier-Stokes, 
steady  and  unsteady,  two-dimensional  fields  [2,  12,  14-19].  Each  has  been 
demonstrated  to  be  effective  in  terms  of  completed  flow  fields  and  their 
comparative  efficiencies  when  measured  against  equivalent  accuracies  and 
resolution  obtained  with  non-adaptive  procedures.  Non-reflecting  outflow 
boundary  conditions  have  also  been  given  consideration  for  time  consistent 
schemes  for  viscous  flow,  and  evaluations  have  been  completed  to  map  out  the 
behavior  of  various  CFD  algorithms  on  a  number  of  parallel  processor 
architectures  [10,  20]  . 


EMBEDDED  ADAPTIVE  ALGORITHMS 
Adaptive  Finite  Element  Methods 

Work  was  completed  on  the  development  of  adaptive  finite  element  method 
(FEM)  algorithms  for  2  and  3  dimensional  steady  state  Euler  equations.  The 
technical  results  are  completely  summarized  in  the  PhD  thesis  of  Richard 
Shapiro  [11] ,  as  well  as  three  papers  at  technical  meetings  [6,  9,  13] .  A 
paper  delivered  at  the  January  1988  AIAA  Aerospace  Sciences  meeting  in  Reno 
[6]  and  attached  as  Appendix  4  presents  two-dimensional  results  for  different 
FEM  formulations.  A  second  paper  [13]  presented  at  the  January  1989  AIAA 
Aerospace  Sciences  meeting  in  Reno  (Appendix  10)  gave  three-dimensional  and 
new  higher  order  (biquadratic)  two-dimensional  FEM  results.  The  investigation 
into  finite  element  algorithms  led  to  an  unanticipated  inquiry  into  the  source 
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of  dispersion  errors  for  discrete  Euler  algorithms.  These  results  were  presented 
at  the  Second  International  Conference  on  Hyperbolic  Problems;  Theory,  Numerical 
Methods  and  Applications,  held  in  Aachen  during  March  1988  [9],  and  the  paper  is 
attached  as  Appendix  6.  In  the  following  paragraphs,  the  highlights  of  all 
these  findings  are  discussed.  The  interested  reader  is  referred  to  Shapiro's 
thesis  or  the  papers  for  more  details. 

The  major  effort  on  developing  adaptive  finite  element  methods  for  the 
steady  state  Euler  equations  was  spent  on  two-dimensional  geometries  in  order 
to  reduce  computational  expense  and  complexity.  The  general  approach  was  to 
use  FEM  discretization  in  space,  Runge-Kutta  integration  in  time  with  local 
time  stepping,  characteristic  boundary  conditions  on  open  boundaries,  and 
added  second  and  fourth  order  dissipation  terms  for  suppressing  background 
oscillations  and  capturing  shock  waves.  Quadrilateral  rather  than  triangular 
elements  were  used  since  they  require  less  computer  memory  and  are  easier  to 
generate.  In  theory,  the  quadrilateral  elements  are  more  accurate  for  a  given 
shape  function,  but  in  practice  this  effect  may  not  be  noticeable.  Triangular 
elements  can  be  used  where  needed  to  fit  body  geometries. 

The  various  topics  which  have  been  addressed  for  2-D  geometries  included: 

-  °  For  bilinear  shape  functions,  different  choices  of  test  functions  were 

analyzed  and  tried.  The  best  choice  was  a  constant  test  function  which 
gives  a  scheme  equivalent  to  a  cell  vertex  finite  volume  method. 

°  The  idea  of  using  a  special  element  when  the  mesh  is  exactly  Cartesian 
was  investigated.  However,  the  saving  in  the  overall  operation  count 
was  not  big  enough  to  justify  the  additional  coding  requirements,  and 
the  idea  was  abandoned. 

°  Biquadratic  shape  functions  were  investigated  for  nonadaptive  meshes. 

The  results  for  the  two-dimensional  test  problems  proved  to  be  very 
encouraging  with  significant  saving  in  computational  time  for 
equivalent  accuracy. 

o  Two  different  smoothing  formulations  were  tried,  a  "low"  order  and  a 
"high"  order  accurate  version.  It  was  found  that  the  overall  accuracy 
of  the  different  FEM  discretizations  is  more  effected  by  the  accuracy 
of  the  smoothing  than  the  accuracy  of  the  unverlying  discretization 
method  for  the  convective  terms . 

o  Numerous  calculations  were  done  for  scram jet  inlet  geometries  and  a 
bump  in  a  channel.  The  adaptive  FEM  algorithm  proved  to  be  robust 
and  capable  of  capturing  shocks  and  slip  surfaces. 

Based  upon  the  experiences  of  the  2-D  studies,  3-D  calculations  were  under¬ 
taken  using  the  cell  vertex  FEM  method.  Calculations  to  verify  the  3-D  code 
were  done  for  a  double-wedge  compression  surface.  Both  non-adaptive  and 
adaptive  calculations  were  performed,  but  the  latter  should  be  con  sidered  as 
a  first  attempt.  Further  refinement  needs  to  be  done  on  adaption  criteria  for 
3-D  flows.  To  our  knowledge,  this  represents  the  first  adaptive  3-D  hexahedral 
FEM  calculations  for  the  Euler  equations.  The  final  3-D  calculation  was  for  a 
scramjet  inlet.  The  basic  features  of  the  flow  were  resolved,  but  the  lack 
of  suitable  computer  capacity  did  not  permit  enough  cells  to  be  used.  These 
results  are  reported  in  [11]  and  [13]  .  Calculations  were  initiated  for  a 
delta  wing  with  the  aim  to  model  the  leading  edge  vortex.  However,  results 
were  not  obtained  by  the  conclusion  of  the  grant.  The  effort  will  continue 
in  a  subsequent  grant. 
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Lastly,  m  the  course  of  the  development  of  the  different  FEM  algorithms, 
dispersion  errors  were  evident  for  flows  involving  weak  shock  waves  and  ex¬ 
pansion  fans.  This  led  to  a  new  analysis  of  dispersion  phenomena  using  spatial 
group  velocity  methods.  The  analysis  correctly  predicts  whether  dispersion 
phenomena  ("wiggles")  will  appear  ahead  of  or  behind  the  disturbance  and  also 
predicts  the  wavelength  of  the  oscillations.  Different  spatial  discretizations, 
mesh  aspect  ratios,  and  local  Mach  number  all  influence  these  effects  in  a  pre¬ 
dictable  way  described  by  the  theory.  The  analysis  and  results  are  given  in 
[11]  and  [9]  . 


Adaptive  Navier-S tokes  Methods 

Our  original  embedded  adaptation  scheme  (3)  was  limited  to  Euler  modeling 
and  therefore  the  capture  of  features  which  represented  either  large  disturbance 
regions  (e.g.  the  near  field  at  an  airfoil  leading  edge)  or  sharp  discontinu¬ 
ities  (e.g.  shocks  and  slip  surfaces  representative  of  wakes).  Fundamental 
aspects  of  the  algorithm  were  formulated  there  to  provide  a  proper  data  structure 
for  the  inherent  unstructured  grid  that  results  for  arbitrary  features,  as  well 
as  the  modifications  that  might  be  required  for  internal  interfaces  created  by 
the  embedded  grid  scales. 

The  extension  of  the  concept  to  allow  Navier-S tokes  modeling  introduced 
a  number  of  additional  algorithm  settings.  These  included: 

o  new  discreteness  modeling  of  the  stencil  for  the  second  derivative 

viscous  terms,  to  ensure  the  individual  integration  of  an  unstructured 
cell  array,  independent  of  each  cell's  neighbors  for  computational 
purposes 

o  allowance  for  multiple  detection  parameters,  such  as  density  differences 
for  discohtinuities  and  stresses  for  viscous  diffusion  regions,  and 
their  combined  governance  of  the  embedding  needs  within  interaction 
regions . 

o  a  directional  embedding  procedure  in  recognition  of  the  frequent 
presence  of  elongated  features  such  as  shock  fronts  and  boundary 
layers 

o  special  implementation  of  turbulence  models  for  consistency  with 
the  appearance  of  unstructured  gridding  and  interfaces  in  regions 
near  a  boundary  and/or  within  the  core  of  a  trailing  viscous  wake 

o  alternate  choices  of  appropriate  orders  of  conservation  and  accuracy 
for  interfaces  within  interior  and  exterior  regions  of  embedding  with 
respect  to  a  given  kind  of  feature 

o  combined  use  of  redistribution  and  embedding  in  establishing  the 

initial  stages  of  adapted  regions  with  very  highly  concentrated  fine 
gridding  in  large  Reynolds  number  boundary  layers 

o  a  spatial  variation  of  time  steps  corresponding  to  the  spatial 

adaptation,  to  alleviate  the  stiffness  associated  with  time  accurate 
computations 

A  number  of  fields  were  calculated  to  demonstrate  the  algorithm's 
utility,  including  single  element  airfoils  at  transonic  speeds  (M  =  0.75, 

Reynolds  number  =  3.8  x  10^)  and  two  element  airfoils  with  different  flap 
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settings  at  low  speed  (M  =  0.19,  Reynolds  number  =  2.5  x  106)  .  Detail  with 
respect  to  separated  regions,  the  presence  and  behavior  of  separation  bubbles, 
gap  flow  between  airfoil  components,  and  shocks  were  captured  with  good 
fidelity  using  two  to  four  adaptation  levels. 

Different  aspects  of  the  work  have  been  reported  at  a  number  of  forums. 
The  initial  Navier-Stokes  effort  was  presented  in  Honolulu  (2)  and  unsteady 
and  turbulent  modifications  first  appeared  at  an  International  Conference  in 

Williamsburg  and  since  then  in  the  published  proceedings  (12)  .  Two 
chapters  of  a  book  on  viscous  computations  were  prepared  on  invitation 
(16,  17)  and  some  essential  elements  of  the  approach  were  included  at  an 
International  Conference  in  Huntsville  (14) .  A  very  detailed  description 
of  the  algorithm,  development,  validations  and  applications  is  the  subject 
of  a  doctoral  thesis  by  Kallinderis  (15) ,  and  final  summaries  have  been 
prepared  for  submission  to  the  AIAA  Journal  (18)  and  the  next  Reno  Aerospace 
Sciences  Meeting  in  January  1990  (19) . 

OUTFLOW  BOUNDARY  CONDITIONS  AND 
COMPACT  HIGH  ORDER  SCHEMES  FOR  NAVIER-STOKES  EQUATIONS 

Non-reflecting  Downstream  Boundary  Condition  for  the  Wake  Problem 

As  indicated  previously  ,  the  value  of  ft,  the  dimensionless  decay  rate, 
and  of  the  dimensionless  group  velocity,  for  long  wave  perturbations  was 
to  be  computed  for  a  range  of  the  parameter  M,  e  and  K.  This  was  done  in 
1988  .  The  main  result  is  that  for  subsonic  flow  with  high  Reynolds  number 
^  is  practically  a  constant,  ft  -  tt^/4,  for  all  practical  configurations 
(.7  <  k  1  .95)  .  The  group  velocity  for  M^  <£  .3  and  K  >  .7  is  practically 
linear  in  K  with  the  slope  depending  on  the  Mach  number.  For  laminar  flows, 
even  for  numerical  boundaries  relatively  near  the  trailing  edge  of  an  air¬ 
foil  -  say  about  5  chords  away  -  the  drag  parameter  K  is  near  1,  namely 
K  >  .91.  For  this  range  of  k, w  is  independent  of  M  and  is  given  approxi¬ 
mately  by 

W  =  1.15  -  2.16K  ( 1 >  K  *  .9) 

The  above  values  for  ft  and  U)^  are  appropriate  for  using  in  the  non-reflecting 
boundary  conditions  when  the  algorithm  used  is  time  consistent.  When,  for 
the  purpose  of  accelerating  convergence  to  steady-state,  one  uses  time- 
inconsistent  schemes  (such  as  by  employing  local  time-stepping) ,  one  has  to 
find  a  new  set  of  ft  and  co^.  This  requires  further  investigation. 


^Final  Technical  Report,  AFOSR  Grant  82-0136,  July  31,  1987. 

2 

J.S .Danowitz,  "A  non-reflecting  boundary  condition  for  the  compressible 
Navier-Stokes  equations  for  two-dimensional  wake  flows,"  M.Sc.  thesis, 
Tel-Aviv  University,  June  1988. 
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Compact  High  Order  Schemes  for  the  Euler  Equations 


It  can  be  shown  (9,11)  that  numerical  approximations  to  the  linearized 
Euler  equations  of  gas  dynamics  give  rise  to  dispersive  errors  which  in  the 
2-D  supersonic  case  depend  on  a  similarity  parameter  ic  =*  (Ay/Ax)  /m^  -  1  (under 
the  assumption  v  <  <  u  everywhere ,  where  u  and  v  are  the  x  and  y  components  of 
the  velocity  vector) .  The  difference  between  the  dispersion  relations  of  any 
numerical  algorithm  and  that  of  the  original  partial  differential  equations 
can  be  plotted  as  curves  in  the  Fourier  plane  with  <  as  a  parameter. 

In  particular  the  results  in  (9,11)  indicate  that  for  central-difference 
schemes,  the  dispersive  errors  are  contributed  mostly  by  the  third  power  of 
the  errors  in  the  Fourier  variables  9  and  It  is,  therefore,  natural  to 
think  of  fourth  order  spatially  accurate  algorithms  as  having  better  dis¬ 
persive  properties.  By  utilizing  the  structure  of  the  Euler  equations  one 
can  obtain,  on  a  Cartesian  grid,  a  fourth  order  approximation  which  instead 
of  using  a  5  x  5  stencil  (and  5x5x5  in  3-D)  relies  on  a  compact  support  of 
3x3  (and  3x3x3  in  3-D)  .  The  advantages  of  the  combination  of  fourth 
order  accuracy  together  with  compact  support  are  quite  obvious  in  terms  of 
total  computer  work  and  memory. 

In  (7)  we  construct  an  implicit  approximate  factorization  (AF) 
algorithm  and  a  4-step  Runge-Kutta  scheme  which  have  the  above  properties 
of  compactness  and  4th  order  accuracy.  In  particular  the  following  points 
should  be  noted: 

.  The  2-D  AF  scheme  is  unconditionally  stable,  has  a  3x3  stencil  and 
at  steady-state  has  a  fourth  order  spatial  accuracy.  The  temporal 
evolution  is  time  accurate  either  to  1st  or  2nd  order  through  choice 
cf  parameter. 

.  In  3-D  the  AF  scheme  has  almost  the  same  characteristics  as  in  2-D 
except  that  it  is  now  only  conditionally  stable,  with  the  stability 
condition  (the  CFL  number)  being  dependent  on  the  "cell  aspect  ratios" 
Ay/Ax  and  Az/Ax.  The  stencil  is  still  compact  and  fourth  order 
accuracy  at  steady-state  is  maintained. 

.  In  the  Runae-Kutta  case  it  is  shown  how  the  Jameson-Schmidt-Turkel 
algorithms1  may  be  easily  modified  to  our  form  which  has,  in  addition 
to  the  higher  accuracy,  markedly  enhanced  stability  limits. 

.  Numerical  experiments  on  a  2-D  shock  reflection  problem  (using  the 
AF  scheme)  show  the  expected  improvement  over  lower  order  schemes, 
not  only  in  accuracy  (measured  by  the  L2  error)  but  also  in  the 
dispersion. 


A. Jameson,  W. Schmidt  and  E.Turkel, 
equations  by  finite  volume  methods 
schemes,"  AIAA  paper  81-1259,  June 


"Numerical  solutions  of  the  Euler 
using  Runge-Kutta  time-stepping 
1981. 
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PERFORMANCE  OF  CFD  ALGORITHMS  ON  PARALLEL  PROCESSORS 


The  objective  of  this  task  is  to  explore  how  representative  CFD 
algorithms  map  onto  different  parallel  processor  architectures.  A  careful 
analysis  was  made  of  an  existing  3-D  Euler  solver  using  the  Jameson  cell- 
centere/  finite  volume  method  in  order  to  profile  the  typical  computational 
loops  in  the  program.  The  loops  were  characterized  by  the  ratio  of  floating 
point  operations  to  memory  references  and  the  span  of  mesh  points  (memory 
locations)  required  by  the  operation.  Three  characteristic  loop  structures 
were  identified. 


1.  The  first  is  represented  by  the  calculation  of  the  pressure  given 
the  values  of  the  state  vector.  This  has  twice  as  many  operations 
as  memory  references  and  involves  data  only  at  a  single  point  in 
the  grid  (i, j ,k) . 

2.  The  second  is  represented  by  calculating  the  surface  flux  integrals 
for  a  cell  given  the  flux  vectors  at  cell  centers.  This  operation 
has  about  the  same  number  of  floating  point  operations  as  memory 
references  and  involves  the  nearest  neighbors  of  a  mesh  point  (i,j,k 
and  i±l,  j  ±  1,  k  ±  1)  . 

3.  The  third  loop  is  represented  by  the  tridiagonal  matrix  solver  and 
requires  about  one  floating  point  operation  for  every  two  memory 
reference.  It  requires  accessing  a  line  of  data  through  the  mesh 
in  each  of  three  directions. 

Although  the  analysis  was  done  for  an  explicit  Euler  solver  with  residual 
smoothing,  the  loop  structures  are  typical  of  other  CFD  codes  using  struc¬ 
tured  grids  such  as  ARC3D. 

In  the  first  application,  three  smaller  loops  representing  these  com¬ 
putational  tasks  were  written  on  two  distinctly  different  architecture 
machines.  The  ALLIANT  FX/8  has  a  shared  memory  and  up  to  8  processors 
while  the  INTEL  iPSC  has  a  distributed  memory  with  up  to  32  nodes  (for  the 
one  we  used) .  The  loops  are  analyzed  and  reported  in  detail  in  [5,8].  A 
paper  summarizing  the  entire  study  was  delivered  at  the  3rd  International 
Conference  on  Super computing  in  Boston  during  May  1988  [10] .  A  copy  is 
attached  as  Appendix  7. 

In  a  second  investigation  [20]  a  2-D  version  of  the  Jameson  algorithm 
was  implemented  on  a  data-flow  machine  simulator  using  a  new  programming 
language  called  Id  World.  Whereas  the  ALLIANT  and  INTEL  machines  execute 
in  parallel  at  the  loop  or  subroutine  level,  a  data  flow  machine  executes 
in  parallel  at  the  elementary  instruction  level.  The  investigation  revealed 
that  CFD  algorithms  are  suitable  for  data-flow  architectures  and  such  achieve 
much  greater  parallelism  than  conventional  machines  now  in  production. 
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TURBULENT  SPOTS  IN  LAMINAR  FLOW 


A  new  set  of  measurements  has  been  carried  out  to  describe  the  transition 
process  of  a  laminar  flow  over  a  flat  plate  as  it  evolves  from  a  small  ampli¬ 
tude  wave-packet  which  has  teen  generated  by  a  controlled  short  duration  air 
pulse,  to  the  formation  of  a  turbulent  spot.  The  experiment  was  completed 
under  the  same  conditions  as  an  earlier  first  set  of  data  with  the  exception 
that  the  initial  amplitude  level  of  the  wave-packet  was  increased.  The 
measurements  are  more  dense  along  the  downstream  direction  (X)  and  include 
a  Y  mapping  of  the  flow  field  which  was  absent  previously.  Both  streamwise 
and  spanwise  velocity  components  (U  and  W  respectively)  were  measured  with 
a  V-shaped  hot-wire  anemometer  while  the  transverse  velocity  component  (V) 
was  obtained  by  using  an  X  hot-wire  anemometer.  A  theoretical  model  which 
includes  the  flow  divergent  effect  and  a  nonlinear  analysis  to  capture  the 
subharmonic  stage  is  now  in  preparation  and  will  be  reported  on  at  a  later 
date.  A  summary  of  the  results  appears  in  Appendix  9. 
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Abstract 

A  new  problem  solving  strategy  for  complex  CFD  sys¬ 
tem*  i*  described,  revolving  around  a  new  hybrid  system 
which  incorporates  the  advantages  of  both  conventional  and 
expert  systems.  The  paper  describes  ths  structure  of  ex¬ 
pert  systems  and  subsequently  their  operation  and  illus¬ 
trates  ths  technique  through  a  local  compressible  flow  anal¬ 
ysis  example.  Comparison  of  ths  operations  for  forward- 
and  backward-chaining  expert  systems  with  conventional 
systems  leads  to  an  understanding  of  the  advantages  and 
disadvantages  of  each.  Baaed  upon  these  observations,  a 
new  hybrid  system  and  associated  problem  solving  strategy 
is  described.  Application  has  been  made  to  complex  grid 
adaptation  of  the  two-dimensional  Euler  equations,  high¬ 
lighting  the  high  efficiency  and  flexibility  of  the  hybrid  sys¬ 
tem. 


Introduction 

Recently,  a  great  deal  of  attention  has  been  focused  on 
expert  systems,  both  because  of  their  power  and  their  mys¬ 
tique.  Early  systems,  most  notably  MYCIN(1]  (for  diagnos¬ 
ing  bactermia  and  meningitis  infections)  and  DENDRAL[2| 
(for  predicting  the  molecular  structure  of  compounds  from 
mass  spectrograms),  all  dealt  with  expert  domain*  in  which 
conventional  systems  (for  example  FORTRAN  programs) 
have  not  been  successful.  This  led  some  to  make  the  exag¬ 
gerated  claim  that  expert  systems  are  sufficiently  powerful 
so  as  to  replace  conventional  systems  in  the  future.  How¬ 
ever  despite  their  utility  as  a  very  powerful  tool  for  some 
applications,  their  direct  use  for  most  CFD  problems  would 
probably  be  unwise  due  to  their  very  low  efficiency  at  repet¬ 
itive  numerical  tasks. 
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To  circumvent  this  difficulty,  several  CFD  researchers 
have  combined  expert  system  and  conventional  techniques, 
yielding  hybrid  systems.  Three  such  efforts  are  described 
her*. 

The  first  of  thee*  is  the  work  of  Tong(3]  in  which  he  uses 
an  expert  system  to  automat*  th*  design  process  for  cool¬ 
ing  fans.  His  system  basically  consists  of  two  components: 
a  conventional  system  to  analys*  a  proposed  geometry  and 
an  expert  system  to  altar  th*  geometry  based  upon  th*  ob¬ 
served  performance  of  the  previous  geometry  and  heuristic 
design  rules  gleaned  from  a  human  design  expert.  Succes¬ 
sive  iterations  between  these  two  components  results  in  the 
evolution  of  an  improved  cooling  fan. 

Th*  second  example  is  th*  PAN  AIR  knowledge  system 
by  Conner  and  Purdon[4].  Her*  there  are  two  major  com¬ 
ponents:  an  expert  system  with  which  a  user  consults  in  or¬ 
der  to  set  up  a  proper  input  file  for  th*  second  component, 
the  PAN  AIR  program  which  is  written  using  conventional 
techniques. 

Th*  third  example  is  a  sonal  grid  generation  system  dis¬ 
cussed  briefly  by  Andrews[5|.  Her*  an  expert  system  is 
used  to  divide  complex  two-dimensional  flow  fields  into  four¬ 
sided,  well-shaped  sones;  a  grid  generation  program  (which 
uses  conventional  programming  techniques)  then  generates 
the  final  computational  grid  within  each  son*. 

In  each  of  these  systems,  the  conventional  and  expert  sys¬ 
tem  components  are  rather  loosely  coupled,  that  is  there  is 
a  limited  interaction  between  th*  two  components.  Conner 
et  al  and  Andrews  apply  th*  two  components  successively 
with  th*  expert  system  performing  th*  initial  processing  and 
then  conventional  components  acting  on  th*  expert  system’s 
outputs.  Even  in  Tong’s  work,  where  hs  alternately  applies 
th*  two  types  of  components,  ths  components  act  somewhat 
independently. 

In  this  paper  a  tightly  coupled  use  of  conventional  and 
expert  system  components  is  described.  This  new  hybrid 
system  architecture  gains  its  power  by  combining  th*  ad¬ 
vantage*  of  each  type  of  component  at  a  lower  level. 


1 


» 


This  paper  begins  by  describing  expert  systems,  both  in 
terms  of  their  components  and  their  operation.  Through  a 
case  study,  local  compressible  flow  analysis,  the  operation 
of  an  expert  system  is  contrasted  with  the  operation  of  a 
conventional  system,  making  the  advantages  and  disadvan¬ 
tages  of  each  apparent.  This  then  leads  to  the  development 
of  the  general  hybrid  system  architecture  which  forms  the 
framework  for  the  MITOSIS  adaptive  grid  program[6).  The 
paper  concludes  with  a  full  description  of  the  hybrid  sys¬ 
tem’s  knowledge  base  that  is  used  to  accomplish  adapta¬ 
tion,  both  as  initially  set  up  and  through  various  revisions 
to  demonstrate  the  ease  with  which  experience  can  be  built 
into  the  program. 


Expert  Systems  —  Background 

Expert  systems  are  computer  systems  which  approach 
problems  that  are  normally  associated  with  human  experts 
by  applying  a  reasoning  mechanism  to  a  body  of  knowledge 
in  its  domain. 

As  shown  in  figure  1,  all  computer  systems,  whether  ex¬ 
pert  or  conventional,  are  composed  of  three  basic  compo¬ 
nents: 

Facts  —  descriptions  of  the  state  of  a  physical  or  abstract 
object  or  situation  of  interest.  For  example,  the  radius 
of  a  circle  or  an  employee’s  name. 

Relationships  —  statements  concerning  the  interconnec¬ 
tions  of  facts.  For  example,  the  circumference  of  a  circle 
is  related  to  its  radius  by  the  algebraic  relation  e  =  2*r. 

Control  —  statements  concerning  the  order  in  which  the 
various  relationships  should  be  applied  to  specific  facts 
in  order  to  accomplish  a  task,  typically  the  latter  being 
the  creation  of  new  facts. 


In  conventional  systems,  programs  consist  of  two  major 
components:  the  variable*  which  contain  the  facts  and  the 
program  which  contains  both  the  relationships  and  control. 
Typically  relationships  are  coded  as  assignment  statements 
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Figure  1:  System  organisation 


while  control  is  coded  implicitly  through  statement  ordering 
and  explicitly  as  GOTOs,  DO-toops,  and  subroutine  calls. 
The  essential  point  is  that  the  relationships  and  control  are 
intertwined. 

In  expert  systems  on  the  other  hand,  the  rules  are  com¬ 
bined  with  the  facts  in  the  knowledge  bate  and  control  is  in  a 
separate  component  known  as  the  inference  engine.  In  this 
way,  the  domain  dependent  information  (facts  and  rules)  is 
segregated  from  the  domain  independent  information  (con¬ 
trol). 

The  facts  in  an  expert  system,  not  unlike  variables  in 
conventional  programs,  have  names  and  values  such  as 
airfoil-type :  NACA0012  and  Maeh-nuaber :0.80.  As  shown 
here,  the  values  can  be  symbolic  or  numeric  or  can  have  the 
value  UNKNOWN.  The  operation  of  many  expert  systems  re¬ 
lies  heavily  on  allowing  values  to  be  unknown,  and  being 
sought  as  part  of  the  solutions.  In  addition  to  this  simple 
representation  of  facts,  other  expert  systems  include  more 
complicated  representations  such  as  object-attribute-value 
triplets,  frames,  and  semantic  nets[7]. 

The  relationships  in  an  expert  system  describe  cauae-and- 
effect  or  other  relationships  between  facts  in  the  domain, 
known  as  rules.  Expressed  as  if-then  statements,  rules  typ¬ 
ically  take  the  form: 

Rule: 


if: 

praxis*  clause  1 

aadlf : 

premise  clause  2 

then  : 

action  clause  1 

andthen: 

action  clause  2 

In  general,  premise  clause*  are  simple  relational  operator* 
which  evaluate  to  either  TRUE,  FALSE,  or  UNKNOWN.  In  order 
for  a  rule  to  be  available  for  execution,  it  cannot  contain 
any  false  premise  clauses. 

The  action  part  of  a  rule  consists  of  one  or  more  action 
clauses,  each  of  which  either  modifies  facts  in  the  knowledge 
base,  performs  a  user-defined  function,  or  performs  input 
and/or  output.  In  tome  expert  systems,  it  is  even  possible 
to  create  new  rules  or  disable  old  ones.  The  standard  func¬ 
tions  which  are  supported  by  most  expert  systems  include 
the  arithmetic  operations  as  well  as  other  common  math¬ 
ematical  functions  such  as  root  extraction.  Us* redefined 
functions  are  generally  just  implementation-language  func¬ 
tions  which  are  executed  based  upon  facts  supplied  through 
the  rulee  and  which  generate  new  facts  that  are  stored  in 
locations  specified  by  the  rule. 

The  control  in  an  expert  system  is  done  in  the  inference 
engine,  whoee  responsibility  it  is  to  decide  which  rulee  in 
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the  knowledge  base  are  to  be  fired  (executed)  and  in  what 
order;  in  other  word*  to  reason  with  the  knowledge  (the  fact* 
and  relationship*).  There  are  two  ba*ic  etrategie*  used  in 
inference  engine*:  forward-  and  backward-chaining. 

In  forward-chaining,  which  i*  al*o  known  a*  data-driven 
inferencing,  the  rule*  are  applied  to  the  currently  known 
facte  such  that  all  subsequently  determinable  facts  are 
found.  Thi»  is  accomplished  through  the  following  steps: 

search  —  search  through  ail  rules,  listing  only  those  which 
have  true  premises.  If  none  qualify,  execution  of  the 
inference  engine  halts. 

select  —  if  the  search  list  contains  more  than  one  entry, 
select  the  most  dominant  entry  from  the  list.  This  step 
is  also  known  a*  conflict  resolution. 

act  —  perform  the  actions  in  the  action  clauses  of  the  most 
dominant  rule  on  the  list,  resulting  in  a  modified  set  of 
facts  in  the  knowledge  base.  The  process  of  executing 
action  clauses  in  a  rule  is  also  known  as  firing  the  rule. 

repeat  —  return  to  the  search  step 

The  Digital  Equipment  Company  usee  an  expert  system 
called  XCON[8]  for  custom  configuring  VAX  computer  sys¬ 
tems.  This  synthesis  task,  in  which  component  compatibil¬ 
ity  is  checked,  power  supplies  and  cabinets  are  assigned,  and 
cables  are  designed,  is  ideally  suited  for  a  forward-chaining 
inference  engine. 

In  backward-chaining,  which  is  also  known  as  goal-driven 
inferencing,  the  procedure  is  inverse  and  begins  with  a  goal. 
The  rules  are  applied  backwards  and  those  which  assert  a 
value  for  the  goal  are  selected.  If  the  rules  themselves  ref¬ 
erence  facts  with  currently  unknown  values,  these  facts  be¬ 
come  new  goals  and  the  processing  continues  recursively. 
This  type  of  inferencing  strategy  is  particularly  appropriate 
for  classification  and  diagnostic  problems,  One  of  the  first 
true  expert  systems,  MYCIN[l),  is*a  backward-chaining  sys¬ 
tem  whose  domain  of  expertise  is  diagnosing  bactermia  and 
meningitis  infections. 

In  both  chaining  systems,  the  inferencing  mechanism  can 
be  summarised  by  matck-ielect-act-rtpeat.  The  match  step 
searches  through  all  of  the  rules,  seeking  those  which  match 
either  current  data  or  goals,  the  select  step  use*  a  conflict 
resolution  strategy  to  pick  a  dominant  rule  from  those  iden¬ 
tified  in  the  match  step,  the  act  step  performs  the  actions 
in  the  then-clause  of  the  selected  rule,  and  the  entire  cycle 
is  then  repeated. 

The  examination  of  all  the  rules  (or  a  subset  of  them)  in 
the  match  step  is  what  gives  expert  systems  their  power  and 
at  the  same  time  makes  them  inefficient  for  repetitive  tasks 
which  can  be  defined  procedurally.  Acceleration  strategies 
to  circumvent  this  problem  (for  example,  context  limiting(9| 
and  the  Rets  match  algorithm[10j)  have  been  developed  but 


attribute 

description 

a 

speed  of  sound 

A 

cross-sectional  area  of  streamtube 

m 

mass  Sow  rate 

M 

Mach  number 

P 

static  pressure 

R 

gas  constant 

6  1 

T 

static  temperature 

u 

velocity 

7 

ratio  of  specific  heats 

P 

static  density 

Table  1:  Initial  set  of  attributes  for  the  local  compressible 
flow  analysis  model  problem 

unfortunately  even  their  us*  results  in  a  system  which  is  less 
efficient  for  some  repetitive  rasks  than  is  the  corresponding 
conventional  system. 

Model  Expert  Systems 

Consider  two  expert  systems,  on*  forward-chaining  and 
on*  backward-chaining,  which  us*  a  common  knowledge 
base  to  attack  a  model  fluid  mechanics  problem.  Though 
the  expertise  her*  consists  mainly  of  simple  algebraic  sub¬ 
stitutions,  the  examples  do  highlight  the  real  strengths  and 
weaknesses  of  expert  systems  through  the  comparison  of 
their  operation  with  that  of  conventional  procedural  sys¬ 
tems. 

The  domain  of  expertise  of  these  model  expert  system*  is 
local  compressible  flow  analysis,  that  is  those  aerodynamic 
and  thermodynamic  variable*  describing  the  relationship* 
governing  the  flow  at  a  point  in  a  streamtub*.  Initially  the 
conditions  will  be  characterised  by  the  attribute*  given  in 
table  1  and  the  relationships  in  table  3.  It  is  assumed  that 
the  value*  and  relationship*  are  ail  written  in  a  consistent 
set  of  units  (or  are  non-dimensional). 

Since  there  is  no  provision  for  algebraic  manipulations  in 
the  inference  engine  a*  used  here,  each  of  the  relationship* 
given  in  table  3  must  be  written  as  a  series  of  rule*,  leading 
to  the  rule  set  given  in  table  3.  The  attributes  7  and  R 
were  considered  parameters  whose  values  would  be  assigned 
a  priori  by  the  user  and  hence  were  not  solved  for  in  this 
set  of  rules,  although  on*  could  certainly  do  that  if  desired. 

The  appropriate  form  of  the  rule  to  implement  p  =  pRT 
is  given  by: 
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Rule  1 :  If  : 

then:  p  <=  pRT 


relationship 

description 

p  =  pRT 

a3  =7  RT 

m  =  puA 

M  =  j 

state  equation 

speed  of  sound  relation 

mass  flow  rate  definition 

Mach  number  definition 

Table  2:  Initial  set  of  relationships  for  the  local  compreeeible 
Sow  analysis  model  problem 


rule  number 

rule  statement 

1 

p  =  pRT 

2 

P~ 

3 

4 

a  =  v'tKT 

5 

'T*  _ . 

T-7* 

8 

m  =  puA 

7 

8 

9 

a  = 

*  *A 

M  S  A, 

W  pA 

10 

11 

«  =  Me 

12 

Table  3:  Initial  *et  of  rule*  for  the  local  compreeeible  Sow 
analyeie  model  problem 


The  lack  of  premises  in  rule  1  is  tied  to  the  lack  of  restric¬ 
tions  in  using  the  relationship  p  =  pRT;  physically  the  only 
restriction  for  using  this  rule  is  that  the  Suid  is  thermally 
perfect.  If  one  wanted  to  extend  the  present  knowledge  base 
to  include  thermally  imperfect  gases,  then  physical  restric¬ 
tions  would  show  up  as  premise  clauses  in  the  rules.  As  a 
result,  rule  1  would  be  written  as  the  two  rules: 

Rule  la:  If:  gas  Is  perfect 

then:  p  <=  pRT 

Rule  lb:  If:  gas  Is  inparfeet 

then :  p  < -  ■■  ■ 


This  illustrates  one  of  the  real  powers  of  expert  systems, 
that  is,  that  the  rules  in  the  knowledge  base  should  be  con¬ 
cerned  with  representing  the  physics  and  not  the  control 
flow.  Concerns  about  whether  or  not  a  rule  applies  to  the 
current  situation  (except  for  physical  restrictions  embodied 
in  premise  clauses)  are  addressed  by  the  controlling  mecha¬ 
nism  in  the  inference  engine. 

It  is  instructive  now  to  examine  the  operation  of  the 
forward-chaining  inference  engine  when  applied  to  the 
knowledge  base  given  above  in  tables  1  and  3.  Since  the 
order  of  rule  application  is  heavily  dependent  on  the  data, 
two  cases  will  be  examined,  their  difference  being  the  initial 
set  of  known  facts. 

In  the  first  case,  the  attributes  T,  R,  7,  M,  m,  and  A  are 
assumed  to  initially  have  known  values,  yielding  the  locus 
of  knowledgs  illustrated  in  figure  3.  Facts  are  represented 
by  circles  containing  the  attribute  name  and  rules  (which 
are  numbered)  are  represented  by  groups  of  vertical  and 
diagonal  links  between  the  circles.  For  example  the  three 
links  emanating  from  the  top  of  a  in  the  figure  indicate  that 
a  can  be  asserted  using  rule  4  given  known  values  of  T,  R, 
and  7. 

The  initial  set  of  facts  are  shown  across  the  top  of  the  fig¬ 
ure  in  a  somewhat  arbitrary  order.  As  a  result  of  the  search 
step  in  the  first  cycle,  only  rule  4  was  triggered  (met  the 
search  criteria)  to  yield  the  value  of  a,  the  speed  of  sound. 
Thus  after  the  first  cycle,  the  parameters  with  known  values 
are  those  shown  on  the  top  two  rows  of  the  figure.  In  like 
manner,  the  second  cycle  reeulted  in  a  value  for  «  through 
the  triggering  and  firing  of  rule  11.  This  figure  clearly  shows 
that  in  order  to  determine  the  value  of  p  from  the  initially 
known  attributes  with  the  original  set  of  rulee,  one  must 
first  compute  values  for  e  and  then  s. 
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In  the  second  example,  the  initial  set  of  facte  consists  of 
M,  7,  R,  T,  p,  and  A;  the  resulting  the  locus  of  knowledge 
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Figure  4:  Locu*  of  knowledge  for  backward-chaining  exam¬ 
ple 

(shown  in  figure  3)  differs  significantly  from  the  previous 
case. 


Figure  2:  Locus  of  knowledge  for  first  forward-chaining  ex¬ 
ample 


In  summary,  both  forward-chaining  expert  systems  start 
with  an  initial  set  of  facts  and  then  use  the  rules  to  assert 
all  possible  consequent  facte.  The  exact  locus  of  knowledge 
is  heavily  dependent  on  the  initial  facts  and  to  a  somewhat 
lesser  degree  on  the  conflict  resolution  strategy  chosen. 


Figure  3:  Locus  of  knowledge  for  second  forward-chaining 
example 


Consider  now  a  backward-chaining  inference  engine  ap¬ 
plied  to  the  local  compressible  flow  knowledge  base  with 
the  same  initial  set  of  known  facta  as  in  the  first  example 
above  (T,  R,  7,  M,  m,  and  A).  For  an  inference  engine  in¬ 
vocation  with  p  as  the  goal,  the  locus  of  knowledge  is  given 
in  figure  4.  Here  facts  with  known  values  are  denoted  by 
an  attribute  name  in  a  circle,  while  those  whose  values  are 
UNKNOWN  are  not  circled.  Rules  are  denoted  in  the  tame 
way  as  in  previous  knowledge  loci. 

For  this  case,  the  original  goal  (p)  is  shown  on  the  bottom 
row  of  the  figure.  The  search  step  in  the  first  backward¬ 
chaining  cycle  finds  rule  1  which  aseerts  a  value  for  p.  How¬ 
ever  in  order  to  fire,  a  value  must  be  known  for  p.  There¬ 
fore  p  is  made  the  new  goal  and  the  process  repeats.  At 
this  point  there  are  two  rules,  2  and  7,  but  rule  2  must  be 
excluded  since  it  requires  a  value  for  p  which  is  already  a 
goal.  The  cycling  continues  until  rule  4  is  found  as  a  meant 
of  calculating  a  value  for  <*.  Since  all  its  inputs  are  known, 
it  can  be  fired,  making  the  value  of  e  available  so  that  rule 

11  fires, _ Eventually  the  value  for  the  original  goal  p  is 

determined. 

One  of  the  particularly  nice  features  of  expert  systems  is 
that  the  separation  of  knowledge  from  the  control  makes  it 
relatively  easy  to  add  new  knowledge  and  hence  new  capa¬ 
bility.  For  example  in  the  local  compressible  Sow  example, 
stagnation  pressure  (p„)  and  stagnation  temperature  (T„) 
are  easily  added  simply  by  defining  two  new  attributes  and 
by  adding  new  rules.  These  new  rules,  which  are  listed  in 
table  4,  are  derived  from  the  energy  equation  and  isentropic 
relations. 
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rule  number 

rule  statement 

- - - 

13 

T,  =T  [l  +  ^M3] 

14 

r  =  r„  [i  +  TfiM3]'1 

15 

«-[*(*->)]' 

16 

17 

18 

r0  =  r(^)^ 

19 

r  =  r„(£)^ 

Table  4:  Rule*  added  to  the  local  compressible  Sow  analysis 
model  problem 


Figure  S:  Locus  of  knowledge  for  forward-chaining  example 
with  extended  knowledge  base 


With  thia  expanded  set  of  rule*,  the  initial  set  of  fact* 
from  Agon  3  now  produce*  the  locu*  of  knowledge  shown 
in  figure  S,  where  the  dotted  line*  simply  denote  the  newly 
added  rule*.  It  should  be  noted  that  the  search**  in  cycle* 
1  through  4  all  contained  two  rules  in  their  conflict  set, 
the  rule  which  flree  and  rule  13;  the  firing  of  rule  13  wa* 
arbitrarily  delayed  until  cycle  S  by  the  conflict  resolution 
strategy. 


Comparison  of  Expert  and  Traditional 
Systems 

The  previous  sections  described  expert  system*  both  in 
terms  of  their  components  and  through  the  examination  of 
their  operation  when  applied  in  various  ways  to  a  model 
problem.  Now  consider  the  characteristics  of  expert  sys¬ 
tems  in  contrast  to  traditional  procedural  systems  in  an 
attempt  to  elucidate  the  strengths  and  weaknesses  of  the 
two  approaches. 

Virtually  all  of  the  differences  between  expert  and  proce¬ 
dural  systems  grow  out  of  their  structural  differences.  Re¬ 
call  that  expert  systems  consist  of  two  parts:  a  knowledge 
base  containing  both  facts  (attributes)  and  relationships 
amongst  the  facts  (rules),  and  a  control  strategy  (inference 
engine)  which  models  one  of  a  few  general  problem  solving 
strategies.  On  the  other  hand,  procedural  systems  are  seg¬ 
mented  differently:  the  variables  (data)  and  the  program 
which  includes  both  relational  and  control  information. 

One  implication  of  this  structural  difference  is  that  the 
separation  of  rules  from  control  in  expert  systems  allows 
relationships  to  be  written  in  the  form  of  cause-and-effect 
statements  which  directly  represent  physical  laws  and/or 
heuristics.  As  a  result,  the  domain  knowledge  is  explic¬ 
itly  stated  in  the  rules,  and  modifications  to  the  underly¬ 
ing  physical  model  or  extensions  to  the  system’s  domain  of 
expertise  is  more  easily  accomplished  with  an  expert  sys¬ 
tem  than  is  possible  with  a  traditional  procedural  system  in 
which  the  relationships  are  intermixed  with  control. 

A  side  benefit  is  that  since  the  knowledge  is  explicitly 
represented  in  the  rules,  it  is  relatively  straightforward  to 
trace  the  locus  of  knowledge  through  the  expert  system. 
Sometimes  this  is  even  done  automatically  by  including  an 
explanation  facility  in  the  inference  engine  where  the  user 
can  ask  questions  such  a*  ‘why  is  a  value  required  for  this 
attribute?*  or  *how  was  this  conclusion  reached?*.  The  ex¬ 
planation  facility  answers  such  questions  by  tracing  through 
the  rule  a ad/or  goal  stacks  created  internally  by  the  infer¬ 
ence  engines  or  by  tracing  through  a  list  of  rules  which  were 
fired. 

Another  important  implication  of  the  separation  of  rela¬ 
tionship*  and  control  in  an  expert  system  is  that  th*  con¬ 
troller  (inference  engine)  can  directly  mimic  on*  of  th*  prob¬ 
lem  solving  strategies  used  most  frequently  by  experts.  Just 
as  human  experts  sometimes  attack  a  problem  forward  from 
th*  initial  set  of  facts,  using  all  appropriate  physical  laws 
and  heuristics,  so  too  will  a  forward-chaining  inference  en¬ 
gine  work  forward,  choosing  th*  appropriate  rules  to  develop 
all  consequences  of  th*  initial  data.  On  th*  other  hand, 
problems  in  which  human  experts  work  backward  through 
physical  laws  and  heuristics  in  search  of  values  for  a  par¬ 
ticular  goal  are  well  suited  to  backward-chaining  inference 
engine*.  Of  course,  for  efficiency  some  domains  require  a 
combination  of  th*  two  basic  strategies  (perhaps  applied 
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iteratively),  resulting  in  inference  engines  which  combine 
elements  of  both  forward-  and  backward-chaining. 

The  essential  point  is  that  either  type  of  inference  engine 
applies  only  those  rules  which  are  appropriate  to  the  cur¬ 
rent  situation,  ignoring  those  in  the  knowledge  base  which 
are  currently  superfluous.  Thus  one  can  start  with  virtu¬ 
ally  any  set  of  facts  and  a  forward-chainer  will  adapt  its 
application  of  the  rules  to  match  the  given  initial  set.  Sim¬ 
ilarly,  a  backward-chainer  will  adapt  its  search  through  the 
rules  based  upon  the  desired  goal  as  well  as  the  initially  pre¬ 
scribed  data.  In  summary,  expert  systems  do  not  require  a 
priori  knowledge  of  which  parameters  are  inputs  and  which 
are  outputs. 

An  unfortunate  consequence  of  this  adaptiveness  is  that 
for  tasks  with  known  sets  of  input  and  outputs,  expert  sys¬ 
tems  are  slow  relative  to  procedural  systems.  This  is  due 
primarily  to  the  search  which  has  to  be  undertaken  in  each 
cycle  on  the  inference  engine.  Additionally,  tasks  which 
can  be  described  procedurally,  that  is  tasks  which  follow  a 
predefined  recipe  of  operations  and  flow  control  based  upon 
simple  comparisons,  cannot  be  easily  prescribed  in  an  expert 
system  since  by  design  the  rules  can  reference  facts  but  not 
other  rules.  Hence  for  applications  which  require  intensive 
amounts  of  repetitive  calculations  which  can  be  defined  pro¬ 
cedurally,  the  relative  inefficiency  of  expert  systems  makes 
them  unsuitable. 

It  should  be  noted  again  that  expert  and  procedural  sys¬ 
tems  are  simply  alternative  strategies  for  problem  solving. 
Cither  strategy  can  be  used  to  solve  any  problem  even 
though  the  other  may  be  easier  to  implement  or  may  be 
more  efficient.  In  general,  problems  which  require  a  large 
amount  of  repetitive  calculations  (which  can  be  prescribed 
algorithmically)  are  better  suited  for  procedural  systems; 
those  for  which  the  domain  knowledge  grows  and/or  those 
in  which  the  types  of  inputs  are  not  known  a  priori  are  bet¬ 
ter  suited  for  expert  systems. 


A  Hybrid  Expert/Procedural  System 

The  basic  requirements  of  complex  numerical  pro¬ 
cesses,  particularly  those  in  Computational  Fluid  Dynamics 
(CFD),  may  be  examined  in  order  to  determine  the  relative 
suitability  of  expert  and  procedural  systems.  It  turns  out 
that  neither  is  completely  satisfactory  by  itself,  leading  to 
the  development  of  a  hybrid  system. 

As  used  here,  the  term  complex  numerical  proceee  refers 
to  a  procedure  which  meets  the  following  criteria: 

«  it  is  composed  of  a  order  ten  sub-processes; 

e  its  sub-processes  are  generally  rather  complicated,  each 
requiring  thousands,  millions,  or  even  mors  operations; 
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Figure  6:  Components  of  a  typical  wing  design  program 

e  its  sub-processes  are  generally  numeric  in  nature,  that 
is  mostly  composed  of  simple  arithmetic  operations  and 
simple  logical  tests; 

e  its  sub-processes  are  relatively  independent  of  each 
other,  that  is  they  may  viewed  as  black  boxes  with 
the  only  communication  between  them  being  through 
a  prescribed  set  of  inputs  and  outputs;  and 

e  ths  order  in  which  the  sub-processes  are  executed  is  not 
fixed  but  instead  changes  depending  on  results  from  the 
various  sub-processes. 

If  more  than  order  ten  sub-processes  are  present,  one  could 
abstract  the  sub- processes  at  a  higher  level,  essentially  cre¬ 
ating  a  hierarchy  of  complex  numerical  processes. 

As  an  example  of  a  complex  numerical  process,  consider 
the  design  of  an  aircraft  wing  as  depicted  in  figure  6.  At  the 
center  of  the  figure  is  a  large  pool  of  data,  containing  all  geo¬ 
metric  and  aerodynamic  properties  which  describe  the  wing 
and  its  operation.  Data  in  the  central  pool  might  contain 
the  shape  of  the  wing  (both  planform  and  sections),  free- 
stream  Sight  conditions,  spanwise  loading  distribution,  and 
pressure  distributions,  etc.  Values  for  many  of  the  prop¬ 
erties  in  the  data  pool  may  be  unknown  at  early  stages  in 
the  design  process,  awaiting  application  of  the  various  sub- 
processes. 

Eleven  such  sub-processes  are  shown  surrounding  the  cen¬ 
tral  data  pool  in  figure  6.  Included  in  this  list  are  two-  and 


three-dimensional  analysis  processes,  processes  to  convert 
two-dimensional  to  three-dimensional  data  and  vice  versa, 
design  modules,  and  graphics  processors.  Those  included 
are  not  meant  to  form  an  exhaustive  set  but  are  typical 
sub-processes. 

All  sub-processes  communicate  with  the  central  data  pool 
via  a  two-way  link.  Additionally,  a  sub-process  cannot  com¬ 
municate  directly  with  any  other  but  instead  must  deposit 
information  into  the  central  data  pool  for  another  sub¬ 
process  to  extract.  Though  this  communication  stricture 
may  seem  overly  severe,  it  leads  to  substantial  benefits  in 
the  design  and  development  of  the  overall  process.  First, 
the  function  of  each  sub-process  is  completely  independent 
of  other  sub- processes,  and  can  be  developed  and  tested 
individually.  Second,  because  the  inner  workings  of  each 
sub-process  is  unknown  to  the  other  sub-processes,  the  algo¬ 
rithm  followed  can  be  changed  as  long  as  the  output  remain 
identical. 

The  exact  sequence  of  sub-processes  used  by  a  wing  de¬ 
signer  cannot  be  prescribed  a  prion  but  instead  is  highly 
dependent  on  the  data  which  is  output  from  prior  sub- 
processes.  Very  often  the  choice  of  which  sub-process  to 
employ  is  heuristic  in  nature.  Economical,  political,  and 
technological  concerns  all  impact  the  choice  of  sub- processes 
which  ths  designer  chooses.  For  exampls,  three-dimensional 
viscous  analyses  are  currently  prohibitively  expensive  and 
may  be  used  only  in  the  most  necessary  cases.  Similarly, 
though  two-dimensional  analyses  are  relatively  inexpensive, 
their  use  is  somewhat  limited  by  the  assumptions  made  in 
their  development. 

The  development  of  a  complex  numerical  process  such  as 
for  the  above  wing  design  example  can  be  accomplished  in 
a  variety  of  ways;  here  two  competing  techniques  will  be 
explored:  procedural  and  expert  systems. 

First  consider  the  traditional,  procedural  system.  For  the 
execution  of  the  algorithmically-based  sub-processes,  the 
procedural  system  will  be  highly  efficient,  primarily  dun  to 
the  fact  that  procedural  systems  compile  directly  into  ma¬ 
chine  instructions  which  more  or  less  directly  mimics  the 
algorithm. 

At  the  same  time,  the  data-sensitivity  necessary  to  prop¬ 
erly  determine  the  order  of  execution  of  the  sub-processes 
is  difficult  to  implement  very  well  in  a  procedural  system. 
For  a  few  sub-processes  and  a  few  possible  conditions,  this 
can  be  handled  with  some  IF-tests  and  GOTOs.  However 
as  the  number  of  sub-proceseee  and  possibilities  increases, 
the  logic  quickly  become  unmanageable,  often  making  ex¬ 
tensions  nearly  impoesible  to  implement.  Many  large  sys¬ 
tems  have  been  rendered  inoperative  (at  least  temporar¬ 
ily)  by  adding  just  one  small  new  feature  somewhere  else 
in  the  system  because  the  reasons  why  and  when  certain 
sub-proceases  were  executed  were  obscured  by  the  logical 
structure  of  the  controller. 

Second,  consider  using  an  expert  system.  Here  the  eon- 


Figure  7:  Organisation  of  hybrid  system 


trolling  logic  concerning  the  execution  order  of  the  various 
sub-processes  is  very  easy  to  implement  and  extend  through 
ths  use  of  expert  system  rules.  The  logic  is  easy  to  decipher 
because  it  is  represented  explicitly  in  the  rules. 

Unfortnnatsiy  the  execution  of  the  algorithms  in  the  sub¬ 
processes  is  considerably  less  efficient  in  an  expert  system 
than  in  a  procedural  system,  mostly  due  to  the  search  step 
in  expert  systems. 

Clearly  neither  procedural  nor  expert  systems  alone  is 
completely  suitable  for  complex  numerical  processes  and 
hence  a  new  hybrid  system  is  proposed  here,  as  shown  in 
figure  7. 

At  the  center  is  the  data  pool  which  contains  all  infor¬ 
mation  to  be  shared  amongst  the  procedures.  Typically  the 
data  consists  of  some  global  parametece  which  describe  the 
current  problem  as  well  as  arrays  which  describe  distribu¬ 
tions. 

Surrounding  the  data  pool  are  a  small  number  of  inde¬ 
pendent  procedures  which  generally  are  both  numerical  and 
algorithmic  in  nature,  making  them  ideal  candidates  for  pro¬ 
cedural  system  techniques. 

The  final  component  is  the  expert  system  (shown  in  fig¬ 
ure  7  as  a  ring  encircling  ths  procedures),  ths  purpose  of 
which  is  to  control  ths  Sow  of  exscution  through  ths  sys¬ 
tem.  As  such  it  passes  control  information  into  and  receives 
status  information  out  of  the  various  procedures. 

Both  the  flexibility  and  efficiency  of  the  hybrid  system 
compare  favorably  with  that  of  expert  and  procedural  sys¬ 
tems  when  applied  to  complex  numerical  processes.  On  the 
one  hand,  since  the  majority  of  computer  resources  (CPU 
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time)  are  consumed  by  the  execution  of  the  eub-proce»eee, 
the  efficiency  of  the  hybrid  eyetem  is  essentially  the  same  as 
the  efficiency  of  procedural  systems  and  much  greater  than 
that  of  an  expert  system.  On  the  other  hand,  since  the  con¬ 
trol  of  the  execution  order  of  the  sub-processes  is  governed 
by  rules  in  the  expert  system,  the  flexibility  and  data  sen¬ 
sitivity  of  the  hybrid  system  is  much  the  same  as  an  expert 
system  and  much  greater  than  is  possible  with  a  procedural 
system. 

The  structure  of  the  hybrid  system  developed  here  is  dif¬ 
ferent  from  the  hybrid  systems  of  Tong,  Conner  and  Purdon, 
and  Andrews  in  one  important  respect  —  the  level  at  which 
the  expert  and  conventional  components  are  coupled.  In 
the  other  three  systems,  the  components  are  coupled  at  a 
very  high  level,  that  is  a  brief  conceptual  block  diagram  of 
the  data  and  processing  flows  through  these  hybrid  systems 
would  include  a  very  small  number  of  blocks  which  were 
conventionally  programmed  and  others  which  included  ex¬ 
pert  systems.  The  two  types  of  components  are  of  roughly 
equal  stature  in  that  each  contains  processing  and  control 
functions. 

On  the  other  hand,  the  hybrid  system  developed  here  is 
coupled  at  a  much  lower  level;  the  conventional  procedures 
are  responsible  for  all  CPU-intensive  functions  and  the  ex¬ 
pert  system  serves  a  purely  supervisory  function,  that  is 
it  controls  the  order  of  execution  of  the  various  processes 
based  upon  status  information  passed  back  from  the  pro¬ 
cesses  to  ths  expert  system.  This  separation  of  processing 
and  control  is  a  major  benefit  of  the  new  hybrid  system 
approach. 

Application  of  Hybrid  System  to  Grid 
Adaptation 

The  effectiveness  of  the  hybrid  system  approach  has  been 
demonstrated  by  applying  it  to  a  two-dimensional  grid 
adaptation  program,  the  MITOSIS  system(fl).  This  section 
begins  with  a  brief  overview  of  ths  grid  adaptation  prob¬ 
lem  along  with  typical  computed  results  from  the  MITOSIS 
program.  A  discussion  of  how  MITOSIS  fits  into  ths  hy¬ 
brid  syetem  strategy  follows.  Ths  section  concludes  with  a 
description  of  the  initial  MITOSIS  knowledge  base  as  well 
as  additions  which  were  included  as  experience  accumualted 
through  the  use  of  the  MITOSIS  system. 

The  Grid  Adaptation  Problem 

The  conflicting  requirements  of  accuracy  and  efficiency 
dictate  that  advanced  calculation  methodologies  be  em¬ 
ployed  for  complex  Sow-fields.  One  method  of  coping  with 
the  aecuracy/efliciency  dilemma  is  to  use  an  adaptive  grid 
technique  such  as  MITOSIS  in  which  an  Euler  flow  is  solved 
using  a  Lax- Wsndroff- type  integration  scheme  on  a  succes¬ 


sion  of  grids.  For  simple  cases,  the  scheme  consists  of  the 
following  steps: 

1.  initialize  the  Sxed,  coarse  global  grid  using  any  stan¬ 
dard  grid  generation  method 

2.  integrate  the  Euler  equations  forward  in  time  on  the 
current  grid  until  the  residual  has  fallen  to  some  spec¬ 
ified  level  and  until  the  global  convergence  parameters 
(such  as  lift  and  drag  coefficients)  have  stabilized 

3.  stop  if  the  converged  solution  is  sufficiently  close  to  the 
converged  solution  on  the  previous  grid,  otherwise 

4.  search  ths  flow  field  for  local  Sow  features,  that  is 
regions  where  the  changes  in  flow-field  quantities  are 
large  (for  example  shocks  and  stagnation  regions) 

5.  divide  cells  near  flow  features  and  fuse  (un-adapt)  cells 
away  from  flow  features,  so  as  to  concentrate  the  com¬ 
putational  grid  where  required 

6.  go  back  to  step  2 

More  complicated  cases  require  a  slightly  different  set  of 
steps  as  descibed  below. 

Figures  8  and  9  show  the  original  and  final  (after  five 
adaptation  cycles)  computational  grid*  for  an  RAE-2822 
airfoil  at  a  free-stream  Mach  number  of  0.75  and  three  de¬ 
grees  angle  of  attack.  The  differences  in  the  Mach  number 
distributions  for  these  two  cases  can  be  seen  clearly  in  fig¬ 
ure  10,  indicating  that  the  increased  resolution,  at  least  in 
the  vicinity  of  the  shock,  is  required  for  this  case.  The  Mach 
number  distribution  obtained  with  the  adapted  grid  is  the 
same  as  that  obtained  using  a  globally  refined  grid,  but  re¬ 
quired  about  100  times  less  computer  time. 


Figure  8:  Initial  computational  grid,  RAE-2822, 

Mo.  =  0.75,  a  =  3*. 
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Figure  11:  Organisation  of  MITOSIS 

Figure  9:  Final  adapted  computational  grid,  RAE-2822, 

Moo  =  0.7S,ci  =  3°. 

MITOSIS  as  a  Hybrid  System 


Figure  10:  Mach  number  distributions  (before  and  after 
adaptation),  RA&2822,  M„  =  0.75, a  =  3*. 


The  application  of  the  hybrid  system  strategy  to  MITO¬ 
SIS  is  shown  schematically  in  figure  11.  As  with  all  hybrid 
systems,  the  central  data  pool  is  surrounded  by  applica¬ 
tion  procedures,  which  in  turn  are  surrounded  by  the  ex¬ 
pert  system  whose  responsibility  it  is  to  schedule  (control) 
the  execution  of  the  procedures.  A  brief  description  of  each 
of  these  components  follows;  a  complete  discussion  is  con¬ 
tained  in  [11]. 

The  center  of  the  figure  contains  the  central  data  pool 
in  which  data  which  is  to  be  communicated  amongst  the 
procedures  is  stored.  For  MITOSIS  this  includes  nodal  ar¬ 
rays  containing  ths  dependent  and  independent  variables 
throughout  the  Sow  field,  data  structure  arrays  which  spec¬ 
ify  ths  interconnections  of  the  nodes  as  well  as  information 
used  within  the  Euler  integrator,  problem  parameters  (such 
as  free-stream  reference  values,  smoothing  coefficients),  and 
convergence  information  (for  example,  iteration  number, 
normalised  changes  in  the  dependent  variables  from  the  pre¬ 
vious  iteration,  etc.) 

The  procedures  wh'ch  surround  the  central  data  pool  are 
those  which  consume  the  vast  majority  of  ths  computational 
resources.  The  procedures  in  the  figure,  which  are  written 
using  conventional  programming  techniques,  include  E2CTRL 
(the  Euler  integrator  initialisation  routine),  E2MAIM  (the 
procedure  which  actually  integratee  the  Euler  equations  one 
iteration),  F2GRAD  (the  procedure  which  computee  the  first 
difference  of  the  refinement  parameter),  and  G2VRXT  (the 
procedure  which  writes  the  current  data  structure  to  a  disk 
file). 


The  expert  system  is  responsible  for  scheduling  the  execu- 


tion  of  ths  various  application  procedure!.  The  inferencing 
mechaniem  used  here  is  forward-chaining  with  the  conflict 
resolution  strategy  being  such  that  the  rule  with  the  great¬ 
est  number  of  premise  clauses  dominates. 

The  Initial  MITOSIS  Knowledge  Base 

MITOSIS’  knowledge  base  consists  of  two  parts:  the 
facts  and  the  rules.  The  facts  contain  constants  (the  in¬ 
put/output  units  of  the  printer,  etc.)  as  well  as  those 
which  vary  through  the  execution  of  the  inference  en¬ 
gine.  Amongst  these  are  the  current  convergence  sta¬ 
tus  (conv-check),  the  lift  coefficient  (clift),  the  division 
threshold  (thr-divd),  and  the  number  of  adapted  cells 
(ncells). 

The  initial  MITOSIS  knowledge  base  consisted  of  10  rules 
which  are  described  in  (11);  following  is  an  abridged  list 
which  serves  to  illustrate  the  controlling  role  of  the  expert 
system.  The  rules  are  formatted  such  that  lower-case  words 
represent  variables  and  upper-case  represents  fixed  control 
words.  As  with  all  computer  processing,  it  is  advantageous 
to  break  the  control  problem  into  a  series  of  smaller  tasks, 
herein  called  contexts.  One  could  consider  these  contexts  to 
serve  somewhat  the  same  purpose  as  subroutines  and  pro¬ 
cedures  in  conventional  computer  systems.  The  MITOSIS 
knowledge  base  initialises  the  context  to  Initial. 

Each  time  that  a  new  grid  is  established  (either  initially 
or  after  an  adaptation),  various  initialisations  are  required. 
These  are  the  subject  of  the  first  rule: 

RULE  SET  CONTROL  AID  INDEX  VECTORS 
IF  context  EQ.  ■ initial" 

THE!  E2CTRL 

ANOTHER  G2SUMT  terminal 
ANOTHER  E2STAT  printer 
ANOTHER  SET  Iters  *0.0 
ANOTHER  SET  conv-stage  *0.0 
ANOTHER  SET  conv-check  *0.0 
ANOTHER  SET  context  "Integrate* 

The  actions  performed  by  this  rule  are:  set  up  the  control 
and  index  vectors  for  the  Euler  integrator  (E2CTRL);  out¬ 
put  the  grid  and  solver  status  information  (G2SUMY  and 
E2STAT);  a-.d  initialise  the  attributes  Iters,  conv-stage, 
and  conv-check.  The  rule  concludes  by  switching  the  con¬ 
text  to  integrate.  Since  this  is  the  only  rule  associated 
with  context  Initial,  it  is  always  the  first  rule  to  fire. 

Rules  with  the  context  Integrate  are  associated  with  in¬ 
tegrating  the  Euler  equations  to  steady  state  on  a  given 
grid.  There  are  four  rules  in  this  context,  one  each  for  the 
four  major  types  of  processing  which  is  to  be  done  in  this 
context.  The  first  rule  in  this  context: 

RULE  TAXS  ORE  ITERATION 


IF  context  EQ.  "Integrate" 

THEN  E2MAIN  delta-du  clift  edrag 

ANDTHEN  E2WIND  conv-stage  conv-check 
ANDTHEN  E2HSPR 
ANDTHEN  ADO  Iters  Iters  *1.0 

is  the  one  which  calls  E2MAIN  to  integrate  the  Euler  equa¬ 
tions  one  iteration.  It  then  calls  the  convergence  check¬ 
ing  algorithm  (E2WIND),  outputs  convergence  information 
for  the  iteration  just  taken  (E2HSPR),  and  increments  the 
number  of  iterations  taken  on  this  grid. 

The  second  rule  in  context  integrate  is: 

RULE  CONVERGENCE  DETECTED 

IF  context  EQ.  "integrate" 

ANDIF  conv-check  EQ.  *1.0 

THEN  SET  context  "converged" 

and  is  responsible  for  noting  that  the  Euler  integrator  hae 
converged  on  the  current  grid  by  monitoring  the  value  of 
conv-check  which  ie  updated  each  time  the  rule  discussed 
previously  is  fired.  Upon  noting  convergence,  this  rule 
switchee  the  current  context  to  converged.  Thie  will  al¬ 
ways  dominate  the  ftret  rule  in  thie  context  by  virtue  of  the 
fact  that  it  has  two  promises  whereas  ths  othsr  has  only 
ons. 

Thar*  are  rules  similar  to  the  latter  which  are  concerned 
with  detecting  that  the  maximum  number  of  allowable  iter¬ 
ations  on  any  one  grid  has  been  exceeded,  as  well  as  that  ths 
iteration  proceee  on  the  grid  has  diverged.  In  each  such  case 
the  solution  file  ie  dumped  to  a  disk  file  and  the  proceesing 
halts. 

Another  context,  adapt,  consists  of  three  rulee  which  are 
associated  with  the  grid  adaptation  process.  In  like  manner 
to  the  rules  in  context  Integrate,  ths  proper  operation  of 
the  rules  in  this  context  requires  that  the  rule  with  ths  most 
premises  dominates  in  the  conflict  resolver. 

The  first  rule  in  this  context  is  ret  *  for  adapting 
the  grid,  and  is  given  by: 

RULE  ADAPTATION  WITH  AUTOMATIC  THRESHOLDS 
IF  context  EQ.  "adapt" 

THEN  ADD  levels  levels  *1.0 
ANOTHER  E2DATA  density 
ANOTHER  F2GRAD 

ANOTHER  F2THRS  thr-divd  thr-fuse 
ANOTHER  F2ADPT  thr-divd  thr-fuse  ncells 
ANOTHER  SHOW  ncells 
ANOTHER  SET  context  "initial* 

This  nils  begins  by  incrsmsnting  levels,  the  number  of 
grid  adaptation  levels.  It  then  calls  E2DATA  to  write  a 
copy  of  ths  dsnsity  into  ths  work  array  which  F2GRAD 
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subsequently  use*  to  set  up  the  refinement  parameter  (the 
first-difference  of  density).  The  division  and  fusion  thresh¬ 
olds  which  are  computed  next  (by  F2THRS)  are  then  passed 
into  F2ADPT  which  actually  adapts  the  grid.  The  execu¬ 
tion  of  this  rule  completes  by  outputting  ncells  and  switch¬ 
ing  the  context  to  initial  to  start  another  cycle. 

The  determination  that  the  solution*  on  successive  grid* 
are  essentially  the  same  and  thus  further  adaptation  is  not 
required  is  the  purpose  of  the  next  rule,  given  by: 

RULE  FINAL  CONVERGENCE  REACHED 
IF  context  .EQ.  "adapt* 

ANDIF  delta-cl  LE.  clift-tol 
ANDIF  delta-cd  LE.  cdrag-tol 
THEN  G2WRIT  final-aoln 
ANDTHEN  E2PRNT 

ANDTHEN  SET  context  "return" 

When  this  rule  fires,  it  writes  the  final  solution  to  unit 
f  lnal-soln  and  switches  the  context  to  return  for  which 
there  are  no  rule*  and  thus  the  inference  engine  stops. 

The  final  rule  in  this  context  ensures  that  the  number  of 
levels  of  adaptation  does  not  exceed  some  specified  maxi¬ 
mum. 

These  rules  simply  describe  the  adaptation  process  enu¬ 
merated  above.  As  can  be  seen  from  these  rules,  the  logic 
associated  with  the  grid  adaptation  strategy  is  represented 
in  a  straightforward  way.  Tracing  the  order  of  execution  of 
the  various  processes  is  simplified  by  the  logic  being  clearly 
described  by  the  rule  text. 

Building  in  Experience 

In  MITOSIS  the  philosophy  was  to  develop  a  simple 
adaptation  strategy  which  worked  well  in  the  majority  of 
cases,  and  then  to  add  new  rules  to  handle  those  situations 
which  failed  with  access  to  only  the  initial  knowledge  base. 
The  ease  with  which  expert  systems  are  easily  expanded  as 
knowledge  about  the  domain  is  identified  is  a  major  reason 
for  using  expert  systems  for  control  and  hence  the  hybrid 
system  approach. 

The  first  situation  which  was  discovered  for  which  the  ini¬ 
tial  knowledge  base  was  insufficient  involve*  the  solution  for 
case*  where  global  parameters  such  as  lift  and  drag  coeffi¬ 
cient*  are  not  available.  Here  the  rule  to  determine  if  global 
convergence  is  achieved  is  clearly  not  applicable.  To  solve 
this  problem,  two  new  rules  were  required: 

RULE  ADAPTATION  WITH  AUTOMATIC  THRESHOLDS (2} 

IF  context  .EQ.  "adapt" 

ANDIF  delta_cl  LE.  ellft.tol 

ANDIF  delta_cd  LE.  cdrag_tol 

ANDIF  clift  .EQ.  *0.0 


ANDIF  cdrag  .EQ.  *0.0 
ANDIF  levls  LT  nax.levla 

THEN  ADD  levls  levls  *1  0 

ANDTHEN  E2DATA  density 
ANDTHEN  F2GRAD 

ANDTHEN  F2THRS  thr.divd  thr.fus* 
ANDTHEN  F2ADPT  thr.divd  thr.fus*  ncells 
ANDTHEN  SHOW  ncells 
ANDTHEN  SET  context  "initial* 


RULE  FINAL  CONVERGENCE  REACHED (2) 

IF  context  EQ.  "adapt" 

ANDIF  delta.cl  LE.  ellft.tol 

ANDIF  delta.cd  LE.  cdrag.tol 

ANDIF  clift  EQ.  *0.0 

ANDIF  cdrag  .EQ.  *0.0 

THEN  G2WRIT  f inal.soln 

ANDTHEN  E2PRNT 

ANDTHEN  SET  context  "return" 


These  rules  are  very  similar  to  two  rules  in  the  initial 
knowledge  base,  except  that  they  contain  additional  premise 
clauses.  Therefore,  if  clift  and  cdrag  are  both  tero,  these 
rules  will  be  found  during  the  forward-ch liner’s  search  step, 
and  will  dominate  since  they  contain  more  premises  than  the 


corresponding  rules  in  ths  initial  knowledge  baa*. 


The  second  case  that  required  a  new  rule  concerned  prob¬ 
lems  with  bow  shocks.  The  difficulty  was  linked  to  the  fact 
that  the  shock’s  upstream  motion  that  results  after  the  lead¬ 
ing  edge  grid  is  refined  by  adaptation  passes  through  the 
edge  of  the  embedded  region.  As  a  result,  conservation  er¬ 
rors  are  introduced  by  the  approximations  in  the  embedded 
mesh  formulation.  To  remedy  this  problem,  a  new  rule  was 
added: 


RULE  GROW  EMBEDDED  REGIONS 
IF  context  EQ.  "diverged" 

THEN  G2READ  out-file 
ANDTHEN  E2DATA  density 
ANDTHEN  F2GRAD 

ANDTHEN  F2THRS  thr-dlvd  thr-fuas 

ANDTHEN  F2ADPT  thr-dlvd  thr-fuss  ncells 

ANDTHEN  G2GR0V 

ANDTHEN  SHOW  ncells 

ANDTHEN  SET  context  "initial" 


This  rule’s  action  clauses  specify  that  after  reading  the 
previous  good  solution  from  disk  (G2READ),  perform  simple 
adaptation,  and  then  grow  the  embedded  regions  (C2GR0V) 
so  as  to  avoid  (or  lessen  the  chance  of)  the  bow  shock  pen¬ 
etrating  the  edge  of  the  embedded  region. 


The  creation  of  the  rules  resultsd  from  the  following  sce¬ 


nario.  For  each  new  (different)  case  that  was  executed  using 
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the  then-current  vereion  of  the  MITOSIS  knowledge  base 
and  ae  a  result  of  problem!  that  were  encountered,  an  engi- 
neer/CFD  researcher  determined  the  came  of  the  problem 
and  fashioned  a  remedy.  The  appropriate  new  rule(s)  were 
then  added  to  the  knowledge  base  (without  any  modification 
of  the  MITOSIS  program  itself)  and  the  case  was  re-run.  By 
following  this  process,  the  initial  set  of  10  rules  has  been  ex¬ 
panded  to  over  25,  greatly  expanding  the  robustness  of  the 
adaptation  process  for  a  wide  variety  of  flow  field  topologies. 

It  is  largely  because  of  the  way  in  which  an  expert  system 
functions  that  adding  such  new  rules  proves  to  be  a  stright- 
forward  process.  In  principle,  this  could  be  implemented 
with  logic  in  a  conventional  porogram  (as  the  first  version  of 
MITOSIS  in  fact  did).  However,  the  addition  of  knowledge 
to  conventional  structures  results  in  extremely  cumbersome 
codes  which  are  unwieldly  to  maintain  and  difficult  to  un¬ 
derstand. 

Summary 

•  A  new  hybrid  system  is  proposed  to  treat  complex  nu¬ 
merical  procedures.  It  consists  of  the  three  essential 
components: 

-  a  central  data  pool  containing  all  problem  data; 

-  any  number  of  procedural  elements  (written  in 
conventional  programming  languages)  in  which 
the  vast  majority  of  computational  resources  are 
consumed;  and 

-  an  expert  system  which  schedules  and  supervises 
the  execution  of  the  various  procedural  elements. 

•  The  hybrid  system  is  a  powerful  programming  strategy 
because  it  is: 

-  expandable  -  new  rules  concerning  the  scheduling 
of  the  procedural  elements  for  new  problems  can 
be  easily  added; 

-  efficient  -  the  vast  majority  of  computer  resources 
are  consumed  in  the  procedural  elements  and 
therefore  the  efficiency  corresponds  to  that  of  pro¬ 
cedural  systems. 

-  understandable  -  knowledge  concerning  the 
scheduling  of  the  procedures  is  explicitly  stated 
in  the  rules,  making  traces  easier  to  understand. 

-  maintainable  -  since  procedures  are  only  con¬ 
nected  through  the  central  data  pool,  extensions 
and  maintenance  are  simplified. 

-  dynamic  -  the  system  adapts  to  new  input  sets 
and  output  sets,  scheduling  only  those  procedures 
which  are  essential. 

-  comprehensive  -  the  expertise  of  more  than  one 
expert  can  be  combined,  making  each  user  effec¬ 
tively  more  capable. 


•  The  hybrid  system  approach  is  quite  effective,  as 
demonstrated  by  its  application  to  the  MITOSIS  adap¬ 
tive  grid  program. 

Acknowledgements 

This  work  was  supported  by  AFOSR  Grant  82-0136, 
Dr.  James  D.  Wilson  technical  monitor. 


References 

[1]  Buchanan,  B.G.,  and  Shortliffe,  E.H.  Rule-Bated  Ex¬ 
pert  Systems:  The  MYCIN  Experiments  of  the  Stan¬ 
ford  Heuristic  Programming  Project.  Reading,  MA: 
Addison- Wesley  Publishing  Company,  1984. 

[2]  Lindsay,  R.K.,  Buchanan,  B.G.,  Feigenbaum,  E.A., 
and  Lederberg,  J.  Applications  of  Artificial  Intelligence 
for  Chemical  Inference:  The  DENDRAL  Project.  New 
York:McGraw-Hill,  1980. 

[3]  Tong,  S.S.,  "Design  of  Aerodynamic  Bodies  Using  Ar¬ 
tificial  Intelligence/Expert  System  Technique*,  AlAA- 
85-0113,  January  1985. 

[4]  Conner,  R.S.  and  Purdon,  D.J.,  "PAN  AIR  Knowledge 
System”,  AIAA-86-0239,  January  1988. 

[5]  Andrews,  A.E.,  "Progress  and  Challenges  in  the  Appli¬ 
cation  of  Artificial  Intelligence  to  Computational  Fluid 
Dynamics",  ALAA-87-0593,  January  1987. 

[6j  Dannenhoffer,  J.F.,  and  Baron,  J.R.,  "Robust  Grid 
Adaptation  for  Complex  Transonic  Flows",  AIAA-86- 
0495,  January  1986. 

(7]  Harmon,  P.,  and  King,  D.,  Expert  Systems  —  Artifi¬ 
cial  Intelligence  in  Business,  New  York:John  Wiley  and 
Sons,  Inc.,  198$ 

[8|  Kraft,  A.  *XC ON:  An  Expert  Coiffiguration  System  at 
Digital  Equipment  Corporation*  In  The  Al  Business; 
The  Commercial  Usee  of  Artificial  Intelligence,  edited 
by  P.  H.  Winston  and  K.  A.  Prendergast,  Cambridge, 
MA:The  MIT  Press,  1984. 

[9)  Winston,  P.H.,  Artificial  Intelligence,  Reading,  MA: 
Addison- Wesley  Publishing  Company,  1984. 

[10)  Brownston,  L.,  Farrell,  R.,  Kant,  E.,  and  Martin,  N., 
Programming  Expert  Systems  in  OPS5:  An  Introduction 
to  Rule-Based  Programming,  Reading,  MA:  Addison- 
Wesley  Publishing  Company,  1986. 

[11]  Dannenhoffer,  J.F.,  "Grid  Adaptation  for  Complex 
Two-Dimensional  Transonic  Flows”,  Sc.D.  thesis,  Mas¬ 
sachusetts  Institute  of  Technology,  Cambridge,  MA.  (in 
preparation) 


13 


APPENDIX 


AIAA  87-1167-CP 

Adaptation  Methods  for 
a  New  Navier- Stokes  Algorithm 

John  G.  Kallinderis 
Judson  R.  Baron 

Massachusetts  Institute  of  Technology 
Cambridge,  MA 


AIAA  8th  Computational  Fluid  Dynamics  Conference 
June  9-11,  1987  /  Honolulu,  Hawaii 


AIAA-37-H67-C? 
Honolulu,  June  1987 


Adaptation  Methods  for 
a  New  Navier-Stokes  Algorithm 


John  G.  Kailinderis* 

Judson  R.  Baron* 

Computational  Fluid  Dynamic*  Laboratory 
Department  of  Aeronautic*  and  Astronautics 
Massachusetts  Institute  of  Technology 
Cambridge,  MA  02139 


Abstract 

Various  adaptation  techniques  for  ths  computation  of  2- 
D  viscous  flows  ars  presented.  An  initially  coarse  grid  is 
automatically  embedded  locally  via  a  feature  detection  al¬ 
gorithm  to  provide  accurate  predictions  of  boundary-layer 
regions.  The  need  for  resolution  in  a  specific  direction  can 
be  used  to  limit  embedding  to  that  direction.  Lastly, within 
appreciably  viscous  regions  the  full  Navier-Stokes  equations 
are  solved,  while  for  the  remaining  areas  the  description 
is  reduced  to  the  Euler  equations.  The  current  procedure 
combines  the  three  adaptation  techniques  -viscous,  direc¬ 
tional,  and  equation-  and  attains  equivalent  accuracy  with 
more  than  an  order  of  magnitude  increase  in  efficiency  over 
non-adaptive  methods. 

The  basic  algorithm  uses  a  new  finite  volume  scheme 
that  has  been  developed  for  the  discretisation  of  the  vis¬ 
cous  terms  and  has  the  conservation  property  which  is  of 
some  importance  when  shocks  are  present.  Example  Sow- 
fields  that  are  considered  include  circular  arc  cascades  in 
both  subsonic  and  supersonic  flow.  Comparisons  are  made 
with  previous  results. 

INTRODUCTION 

In  recent  years,  considerable  progress  has  been  made  in 
the  development  of  numerical  methods  for  the  solution  of 
the  Navier-Stokes  equations.  Most  of  those  method*  how¬ 
ever,  are  not  practical  for  the  calculation  of  complicated 
flows  in  a  design  environment.  The  primary  reason  is  that 
the  efficiency  of  the  current  algorithms  is  poor  and  makes 
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it  difficult  to  obtain  accurate  results. Very  fine  resolution  is 
needed, which  results  in  long  computation  times  even  with 
the  use  of  available  supercomputers.  An  approach  often 
adopted  is  to  nse  a  simplified  set  of  equations  to  describe 
the  flow  field  (e.g.  potential  flow  or  Euler  equations).  Such 
approximations  have  proven  successful  for  specific  kinds  of 
Sow  fields,  but  cannot  cope  with  complicated  flows  as  usu¬ 
ally  are  the  flows  of  engineering  interest. 

The  classical  way  of  achieving  the  needed  resolution  is 
to  cluster  grid  points  in  regions  with  high  flow  gradients. 
This  makes  grid  generation  more  difficult  (especially  in  3- 
D).  More  important  is  that  the  specific  clustering  procedure 
may  also  create  problems  related  to  accuracy  and  stability 
due  to  ths  resulting  grid  stretching  and  skewnese.  Even 
for  simple  geometriee,  clustering  also  frequently  results  in 
unnecessary  resolution  in  some  regions  of  the  domain. 

A  promising  approach  is  to  embed  an  initially  relatively 
coarse  grid  locally  in  regions  with  large  flow-gradients  (e.g 
boundary-layers,  shocks,  wakes  etc).  In  order  to  accomplish 
this  the  algorithm  must  sense  high  gradient  regions  and  au¬ 
tomatically  must  divide  the  gTid-cells  in  such  regions.  This 
approach  has  been  used  for  the  resolution  of  shocks  in  flow 
fields  described  by  Euler  equations  (1,4,5,0,9,12). 

Flow  fields  involving  multiple  scale  phenomena,  such  as 
represented  by  boundary  layers  and  shocks,  are  of  primary 
interest  here.  In  order  to  predict  such  flows  accurately 
and  efficiently,  special  adaptation  techniques  as  well  as  a 
discretisation  scheme  for  the  viscous  terms  in  the  Navier- 
Stokes  system  have  been  developed. 

The  mentioned  inviscid  adaptation  method  has  been 
extended  to  include  viscous  regions  (viscous  adaptation). 
However,  unnecessary  and  therefore  inefficient  embedding 
is  avoided  by  locally  refining  cells  only  in  the  direction  of 
Sow  gradients  (directional  adaptation).  Similarly,  the  vis¬ 
cous  terms  are  evaluated  only  in  those  regions  where  via- 
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ecus  stresses  are  appreciable.  Elsewhere  the  description  is 
reduced  to  the  Euler  equations  (equation  adaptation). 

In  order  to  correctly  model  shocks,  a  conservative  dis¬ 
cretization  scheme  is  required.  The  present  approach  is 
an  extension  of  an  existing  finite  volume  ,  Lax-WendrotT 
type  scheme  1C  .with  viscous  terms  added  and  suitably  dis¬ 
cretized. 

The  numerical  scheme  is  presented  first  and  validated  by 
comparison  with  existing  solutions.  Then  the  adaptive  tech¬ 
niques  are  described  and  computed  results  will  be  presented 
to  demonstrate  their  accuracy.  Finally,  the  efficiency  of 
these  techniques  is  discussed. 


GOVERNING  EQUATIONS 


The  Navier-Stokee  equations  written  in  cartesian  two- 
dimensional  conservation  form  are  : 

dU_  dF_  3G_  _  3R_  as 

dt  +  dx  +  dy  dx  +  dy 

where 
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are  the  state  and  the  convective  flux  vectors  in  the  x  and  y- 
directions  respectively.  The  viscous  flux  vectors  are 
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where  rlf  are  the  viscous  stresses.  For  a  perfect  gas, 

the  pressure  is  related  to  the  specific  internal  energy  a  by 

P  =  (hr  —  1)  [*  —  (p/2)  (**  +  •»)] 


After  nondimensionalising  the  above  equations,  Mach  and 
Reynolds  numbers  appear  as  parameters.  Sutherland’s  law 
was  used  for  the  viscosity  coefficient  for  all  computations. 

NUMERICAL  SCHEME 


A  previously  developed  explicit,  finite  volume  numerical 
scheme  which  was  developed  by  Ni  [lOj  for  the  Euler  equa¬ 
tions  was  used  for  the  discretisation  of  the  convective  terms. 
The  method  uses  Lax- Wendroff  type  marching  in  time  and 
consists  of  two  basic  operations.  Ths  first  of  these  evalu¬ 
ates  ths  first  order  temporal  terms  at  the  grid  nodes  (e  g. 


at  E  in  Figure  1).  These  terms  involve  first  order  spatial 
derivatives  which  are  evaluated  by  averaging  the  surround¬ 
ing  primary  control  volume  center  values  (e  g.  at  a  througr. 
i).  The  second  operation  determines  the  second  order  tem¬ 
poral  terms  at  the  grid  nodes.  These  contributions  ir.vtlve 
second  order  spatial  derivatives  which  are  calculated  using 
a  spatially  translated  control  volume  <e. g.  abed  in  Fig.  1 
This  operation  involves  a  piecewise  integration  around  ’.bis 
secondary  control  volume. 


Viscous  terms 


The  above  Euler  scheme  has  been  extended  by  including 
the  viscous  terms  of  the  Navier-Stokes  equations.  Various 
other  extensions  of  Euler  schemes  without  adaptation  are 
found  in  [8] (for  the  Ni-scheme)  and  [I3j (for  a  Runge-Kutta 
scheme). 

The  first  order  temporal  viscous  terms  involve  second  or¬ 
der  derivatives,  and  the  second  order  temporal  terms  involve 
4th  order  derivatives.  It  is  important  to  minimise  the  num¬ 
ber  of  nodes  which  contribute  to  node  E,  thus  only  those 
viscous  terms  which  provide  first  order  temporal  accuracy 
are  kept,  since  our  interest  is  in  ths  steady  state.  The  spa¬ 
tially  translated  cell  abed  is  used  to  compute  them. 


We  illustrate  the  discretisation  of  the  viscous  terms  by 
consideration  of  the  viscous  term  at  node  E.  Using 
Green’s  theorem  for  the  volume  abed, we  have: 


=  (1/ Sated)  f  («».)d t 

J*  bU 

=  (1/S.s.s)  [(u.).4Ay.s  +  (»»,)»,  Ayt,  + 

(u,).*Ay»»  +  (v,)s«Ay<te] 


Figure  1:  Computational  Grid 
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where  S^i  is  the  area  of  the  cell  a6ci,  Aye4  =  y«  -  ye, 
etc.  The  first  order  derivative  at  the  face  ci  of  the  control 
volume  abed,  is  evaluated  employing  the  area  EcF i.  Similar 
volumes  are  used  for  the  other  face  derivatives.  Thus, 

=  (l/S-ii’upAyp  -t-  u.Ayc  -t-  ucAye  +  u^Ayuj 

where  S.  j  is  the  area  :f  EcF i.  and 

Ayp  =  ( Ay/p  -  Ayrc)/2 

Ayc  =  (y/f  *  ye)/2  -  (yr  -t-  yr)/2 

t tc  =  (us  +  US+UI+  »p)/1 

etc.  The  above  discretization  is  conservative.  The  dis¬ 
cretization  of  the  convective  term*  allows  odd-even  modes 
to  appear  in  both  directions  but  the  discretization  of  the 
viscous  terms  does  not  allow  those  modes  to  appear. 

In  order  to  accelerate  convergence  to  the  steady  state  a 
multiple  grid  method  [10]  was  used  which  acts  only  on  the 
convective  terms.  Its  function  is  to  accelerate  the  propaga¬ 
tion  of  information  by  using  coarser  than  the  basic  grids. 

Odd-even  modes  were  suppressed  in  the  essentially  invie- 
cid  portion  of  the  flow  with  the  aid  of  a  nine-point  Laplacian 
smoothing  operator  of  the  form 

/z(U.  +  +  U,  +  Ud  —  4Ug).  For  shock  capturing  the 

following  smoothing  operator  in  conservative  form  was  used 

<r(|  A ,p  |  A.I/+  |  A ,p  |  A tU) 

where  A.(.),Af(.)  denote  differences  in  the  x,y  direc¬ 
tions.  The  pressure  terms  are  necessary  in  order  to  switch 
smoothing  on  and  off  at  and  away  from  the  shock  re¬ 
gions.  Typical  values  for  the  smoothing  coefficients  were: 
M  =  0.002,  <r  =  0.01 

An  important  property  of  the  above  scheme  is  that  all 
operations  can  be  performed  in  a  piecewise  sense  within 
each  cell  without  the  need  for  any  information  from  the 
outside.  This  is  very  useful  in  dealing  with  unstructured 
grids,  as  will  be  seen  in  the  following. 

Solver  Validation 

A  test  of  the  Navier- Stokes  numerical  scheme  has  been 
carried  out  for  problems  for  which  comparisons  with  previ¬ 
ous  results  were  possible.  Ths  first  case  is  a  flat  plate  in 
a  supersonic  flow  at  Mm  —  3.  and  Rt  =  103(  Fig  2)  with  a 
65x65  grid. 

Four  quantities  (po,Tb,  v/u, Poo)  were  specified  at  the  in¬ 
let  plane,  and  a  no  slip  condition,  temperature,  and  ex¬ 
trapolated  pressure  at  the  solid  surface  boundary.  At  the 
upper  boundary  of  the  domain  a  tangency  condition  was 
applied.  All  state  variables  were  extrapolated  at  the  super¬ 
sonic  parts  of  the  exit  plane,  and  in  the  subsonic  parts  the 
pressure  was  fixed  at  the  value  in  the  immediately  adjacent 


supersonic  part.  In  Figure  2  we  compare  the  skin  friction 
distribution  at  the  wall  with  that  from  [2j. 

A  second  case  is  a  10%  circular  arc  cascade  in  a  subsonic 
flow  of  Me  =  0.5  and  Re  =  6  x  103  (see  Fig  3)  with  a  63x33 
grid.  The  wall-pressure  and  C/  curves  for  this  example  are 
compared  with  [3,8,11)  (see  figure  3).  The  agreement  is 
excellent  for  both  cases. 

ADAPTIVE  TECHNIQUES 
Local  Grid  Refinement 

Accuracy  is  achieved  with  a  minimal  amount  of  computa¬ 
tional  effort  by  embedding  several  levels  of  finer  grids  only 
in  those  regions  of  the  domain  where  important  features 
exist. This  can  be  accomplished  by  simply  subdividing  cells 
of  the  initial  coarse  grid  in  both  cell-directions  (Fig  4).  In 
this  way  the  embedded  and  initial  grids  are  topologically 
similar.  This  means  that  if  the  initial  grid  is  uniform  and 
orthogonal,  these  desirable  properties  characterise  the  em¬ 
bedded  meshes  as  well.  However,  depending  on  cell  and 
feature  orientation, there  are  situations  in  which  resolution 
is  needed  primarily  in  one  direction  in  the  vicinity  of  the  fea¬ 
ture.  In  that  case  it  is  advantageous  to  divide  the  cell  only 
in  that  direction  and  thus  avoid  the  creation  of  unecesaary 
cells  (Fig  4)  (directional  refinement). 

Equation  Adaptation 

The  magnitude  of  viscous  stresses  generally  decreases 
very  rapidly  away  from  solid  boundaries.  On  the  other  hand 
the  evaluation  of  viscous  terms  is  quite  expensive.  Thus  the 
approach  hers  has  been  to  introduce  a  criterion  to  monitor 
the  need  for  solution  of  the  full  Navier-Stokes  equations, 
and  to  do  so  only  when  required  by  the  preeence  of  shear. 
The  Euler  equations  then  are  applicable  everywhere  else. 
This  is  a  relatively  easy  procedure  with  the  present  explicit 
algorithm  because  each  cell  is  integrated  independently  at 
each  time-step.  Essentially  different  integrators  can  be  used 
for  cells  in  which  different  physics  dominates. 

Featnre  Detection 

It  is  essential  in  the  above  adaptation  technique  that  the 
algorithm  be  able  to  sense  the  existence  and  track  the  evo¬ 
lution  of  special  features.  The  dominant  features  of  interest 
are  shear  layers  and  shocks. 

There  is  a  choice  of  flow  parameters  that  can  be  used  to 
detect  the  above  features,  including  velocity,  pressure,  den¬ 
sity  and  Mach  number  distributions.  Variations  which  have 
been  examined  include  undivided  and  divided  differences  for 
these  parameters.  Shear  layers  and  shocks  are  very  differ¬ 
ent  flow  phenomena  with  completely  different  scales.  Use 
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Figure  2:  Supersonic  flat  plate  (Mao  =  3,  Re  =  10J) 
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Figure  3:  10%  Circular  Are  Cascade  (Mo.  =  0.5,  Re  =  8x  10J) 
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of  only  one  of  the  above  parameter*  is  inadequate.  A  com¬ 
bination  of  suitable  parameters  proves  to  be  necessary. 

In  Figures  S  and  6  we  illustrate  the  influence  of  various 
criteria  on  the  detection  of  a  boundary  layer  and  a  shock 
separately.  Fig.  S  illustrates  a  subsonic  flow  field  with  a 
boundary  layer  being  the  main  feature  ,  while  Fig.  €  shows 
the  same  field  in  supersonic  flow.  In  the  latter  case,  a  shock 
is  formed  at  the  leading  edge  and  is  reflected  at  the  upper 
boundary.  Both  of  these  flow  field*  will  be  described  in  more 
detail  in  the  next  sections.  All  the  criteria  are  applied  in 
a  directional  sense  (l,m  being  the  cell-directions).  The  cells 
within  the  boundary  layer  are  generally  much  smaller  than 
those  in  the  inviscid  region.  As  a  consequence,  an  undivided 
difference  of  a  detection  parameter  at  an  inviscid  cell  can 
be  of  the  same  magnitude  as  those  at  a  viscous  cell.  There¬ 
fore  divided  differences  are  required  for  the  correct  capture 
of  shear  layers.  On  the  other  hand,  the  use  of  parameter 
gradients  when  detecting  shock  regions  would  lead  to  in¬ 
creasing  gradients  after  each  adaptation  and  a  decision  to 
continue  adapting.  Thus  we  have  used  both  divided  differ¬ 
ence*  for  shear  layers  and  undivided  differences  for  shocks. 
It  is  evident  that  the  us*  of  density  leads  to  conservative 
number  of  embedded  cells.  More  important  is  its  inappro- 
priatness  for  incompressible  flow.  Mach  number  is  a  poor 
parameter  for  shock  detection  in  the  case  of  weak  disconti¬ 
nuities.  However,  pressure  and  velocity  differences  perform 
quite  well  in  detecting  shocks.  Our  current  approach  is  to 
use  velocity  gradients  in  order  to  detect  the  viscous  regions 
and  velocity  differences  in  order  to  detect  shocks. 

In  addition  to  parameter  selection  a  choice  must  be  made 
of  threshold  levels  for  the  detection  parameters.  The  thresh¬ 
old  is  set  by  using  average  and  standard  deviation  values 
of  the  parameters.  More  specifically  we  use  threshold  = 
#«..  +  a<r*  where  #«..,<?*  are  the  average  and  standard 
deviation  values  of  the  detection  parameter  ♦,  and  a  is  a 
weighting  factor  chosen  empirically  but  found  to  be  appli¬ 
cable  for  a  variety  of  flow  fields  and  conditions. 


Data  Structure 


Very  important  considerations  in  the  adaptive  scheme  re¬ 
late  to  the  storage  and  availability  of  the  information  which 
is  necessary  for  the  various  calculations.  A  special  data 
structure  is  requlrsd  to  service  the  unstructured  grid  aris¬ 
ing  from  adaptation.  Since  each  cell  1s  computed  indepen¬ 
dently  of  its  neifhbore,  the  grid-structure  has  no  impact  on 
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Figure  5:  Boundary  layer  detection 
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Figure  6:  Shock  detection 
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Figure  7:  Basic  Data  Structure 


the  solver,  which  remains  the  same  for  any  grid  topology.  In 
particular  the  pointer  system  is  based  here  on  cells  rather 
than  blocks  of  cells,  and  this  provides  great  flexibility.  A 
system  of  cellwise  pointers,  similar  to  finite  element  connec¬ 
tivity  arrays,  keeps  track  of  the  information  required  by  the 
solver.  In  Figure  7  we  illustrate  the  basic  principle  of  storing 
information  at  the  cell*  of  an  unstructured  grid.  The  four 
node*  corresponding  to  each  one  of  the  cells  are  illustrated 
(e.g.  cell  B  ha*  the  node*  2,  S,  6,  3;  and  node  9  is  appointed 
to  cells  D  and  E) 


Interface  Treatment 


The  existence  of  embedded  regions  within  the  interior  of 
the  domain  introduce*  internal  boundaries  (Fig  8)  which 
must  be  treated  carefully.  Stability  and  accuracy  are  the 
two  important  consideration*  in  the  numerical  treatment  at 
such  an  interface.  The  use  of  directional  refinement  intro¬ 
duce*  several  additional  type*  of  interface*  and  this  impose* 
another  requirement  on  the  choice  of  interface  treatment. 
The  treatment  preferably  should  be  simple  and  easily  ex¬ 
tended  to  3-D. 

An  interface  pose*  problems  in  the  basic  solver  for  two 
reason*.  First  is  that  there  are  cell*  containing  five  node*, 
whereas  the  scheme  is  designed  for  cells  with  node*  at  only 
the  four  corner*.  The  second  i*  that  at  the  interface  there 
is  an  abrupt  change  in  the  *ii*  of  ih*  cells.  This  poses 
accuracy  problsms  for  a  Navier-Stokes  solver. 

On*  way  to  approach  interface  problems,  is  to  perform  a 
special  integration  for  those  cells  which  involve  more  than 
four  nodes.  This  is  accomplished  by  modifying  th*  scheme 
in  such  a  way  as  to  include  th*  additional  interface  nodes  in 
th*  integration  procedure.  It  follows  that  a  different  integra¬ 
tion  then  would  be  required  for  different  types  of  interfaces. 
This  pose*  a  number  of  problems  when  using  directional  em¬ 
bedding  as  her*  and  especially  for  3-D  fields,  because  many 
different  types  of  interfaces  appear. 

Our  approach  has  been  to  treat  all  possible  interface  con¬ 
figurations  in  a  unique  way.  This  is  accomplished  by  disre¬ 
garding  ‘problematic’  interface  cells  and  using  other  existing 
cell*  of  th*  domain  instead  in  order  to  perform  th*  calcula¬ 
tions  at  the  interfaces.  More  specifically  let  us  examine  on* 
type  of  interface  at  Figure  8.  The  nodse  e,  b  are  integrated 
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using  the  parent  cell  B  instead  of  cells  C  and  D,  and  ignor¬ 
ing  the  center  node  e.  Thereafter  the  values  at  node  c  follow 
by  interpolating  from  nodes  a,b.  The  same  approach  is  em¬ 
ployed  for  all  kinds  of  interfaces.  This  treatment  is  easily 
extended  to  3-D  and  has  proven  to  be  accurate  and  robust 
in  the  cases  considered  so  far,  even  when  a  shock  intersects 
an  interface. 

Adeptly  Solution  Procedure 

The  solution  algorithm  that  was  used  consists  of  the  fol¬ 
lowing  steps: 

1.  Initialise  the  field  with  a  uniform,  orthogonal  coarse 
grid  on  which  the  Navier  -Stokes  description  is  applied. 

2.  Monitor  the  residual  until  it  falls  below  a  prespecified 
value;  detect  the  main  flow  features;  refine  the  grid 
locally 

3.  Continue  the  computation  on  the  updated  grid  us¬ 
ing  Euler/Navier-Stokes  solvers  for  the  inviscid/ viscous 
cells. 

4.  Repeat  steps  2,3  for  a  specified  number  of  cycles. 

5.  March  the  solution  to  steady  state 

At  each  embedding  level  the  error  wave-lengths  corre¬ 
sponding  to  the  cell-sisaa  of  that  level  are  smoothed.  In 
that  sense,  adaptation  plays  the  role  of  a  multigrid  method. 

Coupling  of  Gride 

Two  distinct  typee  of  cells  are  involved  during  the  solution 
procedure.  There  are  the  locally  finest  cells  that  are  used 
by  the  basic  solver  to  integrate  the  equations,  and  the  mul¬ 
tiple  grid  coarser  cells  which  are  used  by  the  multiple  grid 
algorithm  after  being  created  from  the  fine  cells  by  deleting 


Figure  9:  Grids’  Coupling 

every  other  grid  line  (Fig.  9).  The  locally  finest  cells  may 
belong  to  either  an  unembedded  or  an  embedded  region  (see 
Fig.  9). 

The  solution  procedure  starts  from  a  sweep  of  the  locally 
finest  (O-level)  cells.  Then  the  multiple  grid  accelerator  is 
used  within  the  embedded  regions  (multiple  grid  level  1  in 
Fig.  9).  Finally  the  multiple  grid  operator  is  used  through¬ 
out  the  whole  domain  (level  2  in  fig.  9).  Notice  that  the 
interfaces  are  ‘invisible’  after  completing  the  multigrid  lev¬ 
els  within  the  embedded  regions  (level  I  in  fig.  9). 

RESULTS 

Model  problems  with  flow  past  a  circular  arc  cascade  in 
both  subsonic  and  supersonic  flow  have  been  used  to  evalu¬ 
ate  the  accuracy  and  efficiency  of  the  adaptation  techniques. 

Subsonic  Flow 

The  Sow  at  Moo  =  O.S  and  Re  —  8  x  103  was  calculated 
using  an  initial  25x25  mesh  with  uniform  spacing  across 
the  boundary  layer  and  up  to  three  levels  of  embedding 
allowed  in  both  directions.  The  final  grid  is  shown  in  Figure 
10.  The  area  near  the  trailing  edge  proves  to  need  less 
embedding  because  the  boundary  layer  separates  and  the 
fluid  is  virtually  stagnant  with  negligible  stresees  .  The 
three  peaks  in  the  convergence  history  (Figure  10)  mark 
the  upset  introduced  by  the  adaptive  embedding  of  the  grid. 
It  is  to  be  noted  that  adaptation  does  not  alter  the  overall 
slope  of  the  curve. 

Indeed  the  presence  of  interfaces  in  the  flow  field  does 
not  affect  the  solution.  This  is  illustrated  in  Figure  11. 
In  the  former  we  see  the  separated  velocity  profile  at  the 
trailing  edge  station  with  three  marks  indicating  the  posi¬ 
tion  of  the  interfaces.  In  the  latter  the  density  contour  plots 
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ire  shown,  and  in  neither  are  kink*  observed  due  to  the  in¬ 
terface*.  Of  cource,  greater  resolution  is  needed  across  the 
boundary  layer  than  in  the  streamwise  direction.  Applying 
directional  embedding  across  the  boundary  layer  at  the  3rd 
level  results  in  significant  savings  in  the  number  of  cells, 
wUh  no  apparent  change  in  the  results  on  comparing  wall 
shear  distributions  obtained  using  adaptation  in  either  both 
directions  or  directionally  i  Figure  I2j. 


Supersonic  Flow 

An  8%  circular  arc  cascade, with  Mae  =  1-4  and  Re  = 
23  x  103  again  with  a  25x25  initial  mesh  was  used  with  uni¬ 
form  spacing  across  the  boundary  layer  and  an  allowance  for 
three  levels  of  embedding  in  both  cell-direction*.  An  oblique 
shock  forms  at  the  leading  edge  as  expected  and  is  reflected 
at  the  upper  symmetry  boundary.  The  reflected  shock  then 
interacts  with  the  boundary  layer  at  the  trailing  edge  region. 
In  Figure  13  ,  the  grid  evolution  is  illustrated  and  demon¬ 
strates  how  the  embedded  grids  follow  the  detailed  physics 
of  the  flow.  Figure  13  (e)  providee  an  enlarged  detail  of 
the  grid  near  the  surface.  The  boundary  layer  is  essentially 
‘lifted’  by  the  adverse  preeeure  gradient  which  is  induced  by 
the  reflected  shock.  Simultaneously,  because  of  the  effective 
corner  which  is  formed  by  the  boundary  layer,  compression 
waves  are  formed  upstream  of  the  interaction  region  and 
coalesce  into  a  weaker  shock  which  impinges  on  the  upper 
boundary.  Note  that  passage  of  the  shock  through  interfaces 
does  not  induce  any  stability  problems.  Further  flow  details 
are  shown  in  the  Mach  number  and  pressure  coefficient,  Cf 
contour  plots  in  Figure  14  .  Note  that  the  slip  line  due  to  the 
Mach  reflection  at  the  upper  boundary  is  clearly  observed  in 
the  Mach  number  contour  plot(Figure  14).  Notice  also  the 
deflection  of  the  reflected  shock  at  the  shock/boundary  layer 
interaction  region,  and  the  separated  recirculating  region  at 
the  trailing  edge.  Expansion  fans  are  formed  at  the  inter¬ 
action  region  because  of  the  effective  wedge  formed  by  the 
separated  boundary  layer.  The  local  embedding  procedure 
appears  to  be  effective  in  capturing  the  detailed  physics  of 
a  flow  field  in  the  presence  of  rather  complicated  multiple 
-scale  phenomena. 

In  order  to  evaluate  the  accuracy  in  this  case,  a  glob¬ 
ally  fine  grid  (97x97)  corresponding  to  two  level  embedding 
was  employed  and  tlw  results  were  compared  with  those 
obtained  by  using  two  levels  of  local,  adaptive  embedding. 
Figure  15  compares  the  wall-  pressure  and  wall-shear  dis¬ 
tributions.  The  agreement  proves  to  be  excellent. 


ADAPTATION  EFFICIENCY 

The  fact  that  the  entire  Navier-Stokee  system  is  solved 
only  within  the  viscous  regions  leads  to  a  2095  time  saving* 
due  solely  to  the  equation  adaptation  scheme. 


Vj%  CIRCULAR  ARC  CASCADE  (subsonic 
Computational  Grid 


(a)  original  scales 


o. 


-i. 


-s. 


lo  *(*{/)  -*. 


•  4. 


•  6. 


(c) 


it«r 


Figure  10:  Embedding  (3-levels) 
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Figure  11:  Effect  of  Interfaces 
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Figure  13:  Comparison  of  wall-shear 


The  directional  embedding  which  was  applied  in  the  sub¬ 
sonic  case  at  the  3rd  level  led  to  a  40%  decrease  in  the 
number  of  cells  compared  to  the  cells  formed  if  embedding 
in  both  directions  was  applied.  This  is  not  surprising  in 
view  of  the  fact  that  a  cell  divided  in  both  directions  adds 
three  additional  cells,  while  directional  division  adds  only 
one  additional  cell.  This  results  in  significant  savings  in  the 
number  of  cells,  and  especially  so  at  the  higher  adaptation 
levels  when  many  now  cells  are  created.  Directional  embed¬ 
ding  should  be  even  more  beneficial  in  turbulent  boundary 
layers  where  at  least  two  nodes  are  needed  inside  the  lami¬ 
nar  sublayer. 

The  subeonic  case  on  a  globally  fine  mesh  would  require 
140  hours  on  a  MicroVajc  computer,  making  the  conserva¬ 
tive  assumption  that  the  same  number  of  iterations  would 
be  required  as  for  the  embedded  case.  With  adaptation  in 
both  cell-directions,  only  11  hours  are  required,  and  this  re¬ 
duces  to  6  hours  with  allowance  for  directional  embedding 


at  the  3rd  level.  Finally,  5  hours  are  required  if  equation 
adaptation  is  also  applied.  Overall,  the  adaptive  techniques 
lead  to  a  28  times  reduction  in  CPU  time.  The  gain  would 
be  appreciably  larger  if  more  than  three  adaptation  levels 
were  permitted. 

In  the  supersonic  case, the  reduction  factor  is  22.  The 
slightly  smaller  advantage  follows  from  the  fact  that  there 
are  more  flow  features  to  resolve,  and  these  result  in  a  larger 
number  of  embedded  cells. 

CONCLUSIONS 

A  conservative  finite  volume  discretisation  of  the  viscous 
terms  of  the  Navier-Stokes  equations  has  been  developed 
using  the  same  stencil  as  for  the  convective  terms. 

Grid  and  equation  adaptation  have  been  carried  out  for 
multiple  and  overlapping  flow  features. 

The  combined  use  of  viscous,  directional  and  equation 
adaptation  has  indicated  CPU  time  reduction  factors  of  ap¬ 
proximately  2S  for  subsonic  and  supersonic  cascade  exam¬ 
ples  for  equivalent  accuracies. 
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Abstract 

This  paper  presents  a  novel  Semi-Implicit  Navier  Stokes  Solver  ( SINSS ').  SINSS  combines  the  ad¬ 
vantages  of  implicit  with  those  of  explicit  schemes:  temporal  integration  is  implicit  in  the  direction 
normal  to  a  body  and  explicit  in  the  direction(s)  tangential  to  it.  Numerical  stiffness  due  to  disparate 
physical  scales  in  the  normal  direction  is  eliminated,  and  the  stability  of  the  algorithm  depends  only 
on  relatively  coarse  stream  wise  grid  spacing  -  not  on  the  typically  fine  normal  spacing.  Approximate 
factorisation  is  unnecessary  and  only  one  matrix  inversion  per  stage  is  required. 

The  semi-implicit  solver  is  applied  to  a  finite  volume  formulation  of  the  2  —  D  thin  layer  Navier- 
Stokes  equations.  Efficiency  of  the  algorithm  is  studied  by  comparison  of  convergence  histories  of  the 
semi-implicit  algorithm  with  those  of  a  multigrid  explicit  scheme  and  a  fully  implicit  approximately 
factorised  scheme.  The  effect  of  residual  smoothing  is  also  considered.  Computations  show  that  the 
explicit,  semi-implicit,  and  fully  implicit  schemes  are  of  comparable  efficiency  for  inviscid  calculations. 
SINSS  is  superior  in  high  Reynolds  number  flows,  where  multigrid  loses  effectiveness  and  the  implicit 
scheme  appears  to  have  convergence  problems. 

I  Introduction 

Fine  spatial  resolution  for  Navier-Stokes  simulations  is  often  necessary  only  in  the  direction  normal 
to  a  body.  At  the  differential  equation  level  this  implies  that  only  the  boundary  layer  like  viscous  terms 
need  to  be  retained,  leading  to  the  so  called  thin  layer  Navier-Stokes  equations.  At  the  discrete  level 
this  implies  mesh  cells  will  have  a  much  smaller  dimension  in  the  normal  direction  compared  to  the 
stream  wise  direction.  Thus,  the  stability  restriction  for  explicit  schemes  in  body  fitted  grid  systems  is 
usually  dominated  by  the  small  normal  spacing  and  results  in  numerical  “stiffness*  in  the  equations. 

The  SINSS  solver  eliminates  the  stability  restriction  due  to  the  normal  spacing  by  solving  the  flow 
equations  implicitly  in  the  normal  direction.  However,  the  solver  is  explicit  in  the  tangential  (flow) 
direction,  thereby  avoiding  factorisation  (AF)  schemes,  CFL  limitations  associated  with  the  AF  error, 
and  a  second  or  second  and  third  block-tridiagonal  inversions  in  two  or  three  dimensions.  This  scheme 
was  first  presented  by  [Loyd  et  al.  86).  Details  may  be  found  in  [Loyd  et  al.  87). 

This  paper  presents  the  characteristics  and  efficiency  of  a  semi-implicit  algorithm  as  applied  to  a 
popular  multi-stage  scheme.  First,  we  present  the  Navier-Stokes  equations,  and  a  simple  method  of 
implementing  the  thin  layer  approximation  that  preserves  the  conceptual  simplicity  and  conservative 
property  of  the  finite  volume  approach.  The  semi-implicit  algorithm  is  then  derived  by  considering  time 
linearisation  of  only  the  cross  flow  flux  terms.  The  CFL  limit  on  the  time  step  for  explicit  schemes 
reduces  to  a  one-dimensional  CFL  restriction  based  only  upon  the  streamwise  terms  [Loyd  et  al.  87). 

To  evaluate  the  relative  efficiency  of  the  semi-implicit  we  compare  convergence  histories  obtained 
with  it  to  those  obtained  with  an  explicit  and  a  fully  implicit  solver  [Beam  k.  Wanning  76).  Multigrid 
(MG)  and  residual  smoothing  (RS)  are  applied  in  the  explicit  solver  to  accelerate  convergence,  and  RS 
is  also  applied  in  the  streamwise  (explicit)  direction  in  the  semi-implicit  scheme.  The  paper  is  concluded 
with  results  for  inviscid  and  viscous  flow  cases. 


2  2-D  Navier-Stokes  Equations 

The  two-dimensional  Navier-Stokes  equations  integrated  in  Cartesian  coordinates  over  a  control 
surface  0  with  boundary  dfi  are: 


I-//  W  dS+i  (P  dy  -  G  dx)  =  0. 
otJJ  o  Jon 


W  =  (p  pu  pv  E)T  is  the  vector  of  state  variables,  where  p,  u,  v  are  density,  z  and  y  components  of 
velocity,  and  E  is  total  energy.  The  flux  vectors  P  and  G  are 


pv?  +  P+  rx 


pv 

puv  +  rya 

1  puv+rxy  i’  ""  I  pv3  +  P  +  ryy  I' 

\  pvH  +  urx*  +  vrxy  -  qa  J  V  pvH  +  uryx  +  vryy  -  qyJ 

H  is  the  stagnation  enthalpy  (H  =  E  +  Pjp)t  and  the  viscous  stresses  are,  with  the  Stokes  hypothesis: 


«.-*(«— ft -is)  *  -*(if-«ft-ft)- 

Pr  is  the  Prandtl  number,  Pr  =  -j-t.  The  equation  of  state  for  a  perfect  gas  closes  the  system: 

P-h-l)[5-ip(«3  +  «3)].  (4) 

2.1  Thin  Layer  Approximation 

The  thin  layer  approximation  is  appropriate  for  high  Reynolds  number  flow  cases  where  shear  stresses 
in  the  body  normal  direction  are  much  larger  than  other  viscous  stresses.  We  restrict  our  attention  to 
flows  where  this  is  the  case  and  apply  the  thin  layer  approximation. 

Because  of  the  assumptions  in  the  thin  layer  form  of  the  Navier  Stokes  equations,  the  computational 
(£,  *?)  grid  must  contain  a  family  of  lines  that  is  body  normal  or  nearly  so.  Since  the  normal  direction  rj 
in  general  does  not  correspond  to  either  the  z  or  y  coordinate  direction,  the  derivatives  in  the  viscous 

stresses  must  be  transformed  to  £  and  rj  coordinates  via  the  generalised  transformation: 

9  —  9  1  9ri  9  __  c  9  1  n  9  9  __  9£  9  1  9t}  9  c  9  •  n  9  (r\ 

“  Jy  Jt  +  9y  9n  ~  *V  9i  ^  "v  dr\  *  J 

Using  the  assumption  »  Jy,  all  £  derivative  terms  may  be  dropped  to  obtain: 

Jm  =  TJxJn  ’  5y  =  ^vJn  '  (6) 

The  thin  layer  shear  stresses  and  heat  fluxes  become: 

r*»  =  §M  (2,»sft  ~  ^vfj)  Tvv  ~  if*  (2r?v§%  _  rfx§v) 

ray  =  rya  =  p  +  'J*  f^)  (7) 


The  viscous  terms  need  only  be  computed  at  faces  1  and  3  (Figure  1).  The  viscous  fluxes  across 
faces  2  and  4  are  discarded  since,  by  assumption,  they  are  small  in  comparison. 


3  Spatial  Discretization 

The  governing  equations  must  be  discretised  before  numerical  solution  can  be  attempted.  Discretisa¬ 
tion  may  proceed  in  two  steps,  spatial  and  temporal.  Although  the  two  steps  are  not  independent,  since 
stability  of  the  temporal  integration  depends  on  the  form  of  the  spatial  operator,  it  is  convenient  to 
consider  them  separately.  In  this  section  we  discuss  the  spatial  discretisation,  as  well  as  the  related 
topics  of  boundary  conditions  and  artificial  viscosity. 
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Figure  1:  Body  Normal  Geometry 


3.1  Spatial  Difference  Operators 

The  line  integrals  in  (1)  are  replaced  by  a  discrete  summation  of  fluxes  across  each  cell  face: 


&  (F  dy  -  G  dx)  —  ^  FiAyi  —  GiAii 

St 


(8) 


F(  and  Gi  are  approximated  by  simple  averages  of  the  flux  vectors  at  the  adjacent  cells,  and,  at  faces 
3  and  1,  a  contribution  from  the  viscous  terms: 

Fa  =  .SxfFij  +  Fi+i,,) 

(9) 

F3  =  .5  x  (F^,  +  F,iJ+l)-+  F.i.  . 

A’s  indicate  differences  taken  in  the  counterclockwise  direction,  e.g.: 


Aya  =  y«+i,y+i  -  y«+i,y 
Ay3  =  y.’,y+i  -  y<+i,y+x  • 

The  viscous  operator  is  written  in  finite  difference  form  as  either 

an  -  ()/+i-()y  or  an  Qy-Qy-I 

3ij  Ary  drj  Ar) 


(10) 


(11) 


depending  on  whether  the  viscous  flux  at  face  3  or  at  face  1  is  desired.  The  appropriate  metric  rjs  or 
f)y  is  obtained  from  the  definitions: 


1  ay  1  dx 

J3Z'  [  } 

where  J  is  the  Jacobian  J  =»  x^y,  —  x,y(.  Thr  metrics  defining  J  are  calculated  with  centered 
differences. 

The  discretisation  reduces  to  a  second  order  accurate  centered  difference  approximation  on  a  Carte* 
sian  grid.  That  order  of  accuracy  is  not  maintained  on  arbitrary  grids,  and  it  is  important  that  grids 
vary  smoothly  to  limit  degradation  in  accuracy.  It  is  generally  true  that  for  a  given  order  of  accuracy 
a  compact  stencil  such  as  this  will  result  in  a  smaller  truncation  error  than  a  less  compact  stencil. 

Upon  forming  the  viscous  terms  at  each  face,  the  finite  volume  line  integration  proceeds  as  before. 
The  definitions  (11)  are  convenient  because  they  require  only  a  three  point  stencil  for  the  thin  layer 
Navier  Stokes  terms.  Because  it  is  written  in  finite  volume  form  the  scheme  identically  conserves  mass, 
momentum,  and  energy. 


3.2  Artificial  Viscosity 

The  discretisation  described  above  allows  odd-even  decoupling  of  state  vector  values  at  adjacent 
points.  Dus  to  aliasing  errors,  this  decoupling  may  inhibit  convergence  of  the  temporal  integration. 
Also,  in  inviscid  regions  one  finds  that  discontinuities  in  the  flow  field  may  cause  divergence  of  the 
algorithm.  Artificial  viscosity  is  added  to  the  physical  fluxes  to  damp  out  the  non-physical  odd-even 
oscillations  and  stabilise  the  integration  in  areas  of  discontinues  flows  such  as  found  around  a  shock 
wave.  We  use  a  pressure  weighted  blend  of  second  and  fourth  differences  that  has  been  established  as 
particularly  effective  [Jameson  et  aL  Si].  In  regions  of  viscous  flow  the  artificial  viscosity  is  tuned  off. 


3.3  Boundary  Conditions 

Numerical  boundary  conditions  must  be  impoeed  at  the  body  and  at  inflow  and  outflow.  For  viscous 
cases  we  set  the  velocity  at  the  body  equal  to  sero  and  maintain  only  viscous  and  pressure  contributions 
to  the  flux  integrals  (8).  Riemann  invarient  boundary  conditions  are  specified  at  the  inflow,  and,  for 
in  viscid  cases,  at  the  outflow.  For  viscous  flows,  all  quantities  except  the  pressure,  which  is  set  from 
the  freest  re  am,  are  extrapolated  from  the  interior  at  the  outflow  boundary. 


4  Semi- Implicit  Approach 

This  section  describes  the  temporal  discretisation  of  the  governing  equations.  We  begin  with  a 
synopsis  of  an  explicit  multistage  approach,  which  also  serves  to  introduce  the  nomenclature.  More 
importantly,  the  semi-implicit  approach  presented  in  the  following  section  is  easily  derived  via  consid¬ 
eration  of  the  explicit  integration. 

Various  techniques  for  accelerating  convergence  are  often  used  with  explicit  schemes.  To  enable  a 
fair  comparison  we  implement  residual  smoothing  and  multigrid  in  the  explicit  scheme  in  Sections  4.4 
and  4.5.  We  also  apply  residual  smoothing  in  the  explicit  direction  in  SINNS. 


4.1  Explicit  Multi-Stage  Integration 

A  popular  multistage  scheme  for  fluid  dynamic  calculations  [Jameson  et  al.  81]  is  given  as: 
W°  =  Wn 

W1  =  W°  -  ai  $  [ £,4=1  F?Ay,  -  G?Ax,  -  D°] 

W2  =  W°-  «3^  [£1^  Pi  Ay,  -  G,lAx,  -  D°] 

W3  =W»-a3f  [Z*ml ??Ay,  -  G3 Ax,  -  D°] 

W«  ^W0-  ^[^FfAyj-GfAx.-D0] 

W"+l  =  w4  . 


(13) 


Superscripts  denote  temporal  stages.  The  artificial  viscosity,  operator  D  is  frosen  at  the  first  stage 
to  minimise  computational  effort.  The  constants  (ai,a3,a3)  are  equal  (1/4, 1/3, 1/5),  except  in  the 
multigrid  scheme.  Vectors  are  in  bold  print.  Matrices  will  be  denoted  by  [  ]. 

The  time  step  At  that  may  be  taken  is  limited  by  the  CFL  condition.  The  four  stage  scheme  above, 
with 
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A*“A|V.dL|ma,  +  C|dLma,|  ’  (14) 

where  V  =  «r+  vf,  C  is  speed  of  sound,  and  rfL  =  Axf+  Ayf,  is  stable  for  A  <  2%/2. 


4.2  Semi-Implicit  Integration 

The  explicit  temporal  time  stepping  (13)  can  easily  be  converted  to  a  semi-implicit  discretisation. 
Consider  the  first  stage  of  a  multistage  scheme  with  implicit  semi-discretisation  of  the  normal  component 
of  the  flux  vectors: 

Wl  -  W°  =  -aif  [(F1  Ay  -  GlAx)i  +  (F°Ay  -  G°A*)3+ 

(15) 

(F1  Ay  -  G‘Ax)3  +  (F°Ay  -  G°Ax)4  -  D°]  . 

Each  term  in  (15)  gives  the  flux  across  one  of  the  grid  faces.  Using  the  standard  Newton  type  linearisa¬ 
tion  for  F  and  G,  we  write, 


awO 

F1  =  F°  +  -j— At  +  0(At3) 

at 

=  F°  +  [A|AW  +  0(At3) 


Gl 


=  G0+^-At+O(At3) 

at 

=  G°  +  [2?]A  W  +  0(At3) 


(16) 


where  [A|  and  [B|  are  defined  as  the  4  x  4  Jacobian  matrices  [dF°/dW°j  and  [3G°/dW°]l  respectively, 
and  AW1  =  W1  -  W°.  Inserting  (16)  into  (15)  and  reordering  gives 

•  4  1° 

7:  Fi  Ayi  -  Gi  Azi  -  D 

,i=i 

(17) 

This  is  an  implicit  matrix  equation  for  AW.  The  RHS  is  the  usual  semi-discrete  form  of  the  residual, 
while  the  LHS  differs  from  (/]  due  to  introduction  of  the  terms  from  the  time  linearisation  of  P  and  G. 
Because  the  LHS  contains  only  dependent  variables  at  (/  -  l,j,  j  +  1),  it  is  a  block  tridiagonal  system 
of  equations. 

Subsequent  steps  in  a  multistage  scheme  have  the  same  form.  For  a  four  step  scheme: 


[J]  +  <*i^-(MiAvi  +  [A]3Aya  -  (B]iA*i  -  (B]3Az3) 


aw.-a.f 


W°  =  W" 

[LHS)°  AW1  =  -ai  *?Ay,  -  G?Az,  -  D°] 

[LHS)1  AW3  =  -a3^  [ELi^Ayi  -  G/Az,  -  D°]  -  AW1 
(LHS)3  AW3  =-a3^[ELiy?^W-G?^*«-I>Q]  -(AW»  +  AWl) 

[LHS]3  AW4  =  -  [e,4=1  P? Ay,  -  G?Az,  -  D°]  -  (AW3  +  AW3  +  AW1) 

w»+i  =  w3  +  AW4 


(18) 


where  [LHS\  =  [/)  +  ([A|iAyi  +  [A|3Ay3  -  [B^Azi  -  [B)3Az3)  and  AW5  =  Ws  -  Ws“l. 
Although  the  RHS  of  stages  three  and  four  contain  more  than  one  vector  AW,  only  one  AW  needs 
to  be  stored.  Subsequent  AW’s  are  simply  added  to  the  stored  vector  to  give  E?ai  AW.  The  system 
(18)  can  be  efficiently  inverted  with  a  block  tridiagonal  Gauss  elimination  routine. 

The  time  step  in  the  semi-implicit  integration  is  limited  only  by  the  tangential  flux  terms,  which 
were  treated  explicitly.  Equation  14  reduces  to: 


=  <i9> 

where  k  is  the  tangential  spacing,  and  A  <  2\Jl  [Loyd  et  al.  87).  The  normal  spacing  is  no  longer 
restrictive. 


4.3  Matrix  Conditioning 

For  grids  with  high  aspect  ratio  cells  (Az/Ay  >>  1)  the  matrices  [LHSj  become  increasingly  ill 
conditioned.  Consider,  for  example,  a  rectangular  mesh  with  Az  =  Const,  and  Ay  =  AAz,  where 
A  <<  1.  Then, 

(LHS)  *  [71  -  [[B|iAzi  +  [B[3Az3[ 

and  diagonal  dominance  is  lost  as  the  off  diagonal  terms  increase  with  1/A. 

We  increase  the  diagonal  dominance  of  [LHS\  by  adding  implicit  smoothing.  It  is  applied  by  adding 
to  [L/7S|  the  undivided  second  difference  operator 

-m[Wy+1-2W,  +  W,_1]  .  (21) 


The  implicit  smoothing  does  not  affect  the  steady  state  solution;  however,  large  m;s  may  inhibit  con¬ 
vergence  to  steady  state. 


4.4  Residual  Smoothing 

The  Courant  number  limitation  both  for  the  explicit  and  semi-implicit  schemes  can  be  relaxed  by 
smoothing  the  residuals.  In  effect,  this  increases  the  stencil  of  influence  of  the  difference  scheme  and 
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Figure  2:  Mach  Contours  in  Channel  Flow:  In  viscid  Sc  Viscous  (Re  =  2000) 

thus  increases  the  permissible  time  step.  Convergence  acceleration  is  a  result  of  both  the  increase  in  At 
and  the  damping  of  the  residuals.  Residual  smoothing  is  best  applied  implicitly.  In  two  dimensions: 

(1  ~ 

(22) 

where  &  is  the  vector  of  residuals  and  S  is  the  undivided  second  difference  operator.  In  the  semi-implicit 
scheme  residual  smoothing  is  applied  only  in  the  explicit  direction,  since  the  implicit  discretisation  and 
smoothing  has  a  similar  effect  in  the  cross  stream  direction.  Optimal  values  of  the  smoothing  coefficient 
and  the  new  time  step  may  be  found  by  numerical  experimentation. 


4.5  Multi-Grid  Convergence  Acceleration 


Multigiid  can  also  be  used  to  accelerate  convergence.  Multigrid  works  by  accelerating  the  propaga¬ 
tion  of  information  across  the  grid.  The  conservation  equations  are  solved  on  successively  coarser  grids 
to  achieve,  in  effect,  a  larger  difference  stencil.  Coarse  corrections  are  interpolated  back  up  to  the  fine 
grid  which  drives  the  scheme.  On  coarse  grids,  substantial  2nd  and  4**  difference  damping  is  added 
to  the  fluxes  to  kill  short  wave  length  disturbances.  The  restriction  operator,  or  forcing  function,  is 
also  smoothed  with  a  second  difference  operator.  We  use  a  simple  V-type  strategy  to  cycle  from  fine 
through  coarse  grids  and  back  to  fine.  Details  are  given  in  [Loyd  et  al.  87]. 


5  Results 


Following  are  results  of  a  set  of  inviscid  and  viscous  flow  cases.  Figure  2  gives  the  geometries  and 
typical  Mach  number  contours  of  the  two  sets  of  cases.  The  inviscid  case  is  a  M*  =  .5  channel  flow  with 
a  t/c  =  0.1  circular  arc  bump.  The  viscous  test  case  is  laminar  flow  at  Moo  =  0.5  in  a  channel,  with  a 
lower  wall  beginning  1/3  of  the  way  into  the  channel.  Symmetry  is  assumed  at  the  upper  boundaries. 
Because  both  flows  are  subsonic  no  second  difference  smoothing  is  added  to  the  spatial  operator.  A 
very  small  amount  of  fourth  difference  smoothing  (»/«  =  0.002)  is  added. 

The  cases  were  run  with  each  of  the  three  schemes.  Considerable  care  was  taken  to  make  a  fair 
comparison,  and  for  each  method,  parameters  used  were  those  that  gave  the  most  efficient  solution.  All 
three  codes  were  written  from  scratch  by  the  first  author.  The  explicit  scheme  simply  requires  setting 
[0]  =  (A|  =  0  in  Equation  18.  The  Beam  Si  Warming  scheme  uses  a  three  point  backward  temporal 
integration,  although  backward  Euler  temporal  integration  was  also  tried  with  similar  results.  The  CFL 
number  resulting  in  quickest  convergence  was  chosen.  Boundary  conditions,  smoothing  formulation,  and 
flux  balance  formulations  in  the  codes  are  identical. 

Convergence  was  taken  as 


t.J 


Apu 

+ 

A  pv 

AE/Soo  \ 

At 

At 

i 

At  J 

5  x  10 


-4 


(23) 


This  criterion  allows  fair  comparison  of  convergence  histories  calculated  with  different  methods  and 
time  steps.  All  calculations  were  mads  on  a  DEC  Micro  to*  //  which  is  approximately  equivalent  to  a 
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5.1  Inviscid  Flow  Cases 

Table  1  gives  the  iteration  histories  for  the  inviscid  cases  calculated  with  the  explicit  scheme.  The 
second  and  third  columns  give  the  CPU  time  (in  minutes)  and  iteration  count  for  the  scheme  without 
acceleration  devices.  The  fourth  and  fifth  give  those  results  for  the  scheme  with  multigrid  (MG)  and 
residual  smoothing  (RS).  Each  calculation  was  made  on  a  grid  with  48  stream  wise  cells  and  16,  24,  or 
32  cells  in  the  cross  stream  direction. 


Table  1:  Inviscid  Solution  with  Explicit  Scheme 


#  cells 

NO.  ACCEL. 
CPU(m)  |  ITER 

MG  k  RS 
CPU(m)  |  ITER 

16 

291 

4177 

19 

155 

24 

754 

7295 

37 

208 

32 

1370 

9970 

71 

306 

Solutions  without  acceleration  mechanisms  are  characterised  by  a  very  slow  convergence  rate,  due, 
in  part,  to  slow  damping  of  pressure  waves.  Multigrid  and  residual  smoothing  are  are  very  effective  for 
this  case,  resulting  in  an  iteration  count  reduction  of  up  to  a  factor  of  35  which  gives  a  factor  of  20 
savings  in  CPU  time. 

Table  2  gives  convergence  histones  for  the  same  flow  cases  using  the  semi-implicit  and  fully  implicit 
schemes.  The  iteration  count  is  much  smaller  with  both  the  semi-implicit  and  the  fully  implicit  algorithm 
than  with  the  explicit  scheme.  However,  each  iteration  takes  proportionally  more  CPU  time,  resulting  in 
similar  efficiency.  Note  that  SINSS  iteration  counts,  especially,  appear  to  be  unaffected  by  the  number 
of  normal  grid  cells. 

Table  2:  Inviscid  Solutions  with  SINSS  and  Beam  k  Warming  Scheme 


Viscous  Flow  Cases 


In  viscous  flow  cases,  the  accurate  prediciton  of  skin  friction  is  usually  of  importance.  To  ensure  that 
this  quantity  is  converged  we  require,  tn  addition  to  (23),  that  the  percentage  change  in  skin  friction 
coefficient  over  time  is  small: 

’ft[*A|££)r<5.,0-4  ,24, 


where  ?S(AC/)  =  (Cj  -  C)+1)/C7)  and  is  a  representative  time  step.  Summing  over  chree 

iterations  helps  eliminate  spurious  small  values  of  A%C/  due  to  oscillatory  convergence  of  the  skin 
friction  coefficient. 

Reynolds  numbers  (baaed  on  channel  height)  are  2  x  103,  104,  and  106.  All  cases  have  48  cells  in 
the  strearawiae  direction  and  24,  32,  or  32  cells  across  the  half-channel.  The  grids  were  generated  with 
stretchings  A  —  Ayy+i/Ay,  of  1.12,  1.15,  or  1.18,  respectively. 

Iteration  histories  and  CPU  requirements  are  given  in  Table  3.  The  second  column  gives  the  accel¬ 
eration  mechanism  (RS  and/or  MG)  for  the  explicit  code.  The  semi-implicit  code  used  only  residual 
smoothing. 

SINSS  does  significantly  better  than  the  explicit  or  the  implicit  scheme  at  all  Reynolds  numbers. 
However,  at  Re  —  10®  an  implicit  smoothing  coefficient  value  of  fits  —  0.2  was  necessary  for  convergence. 
The  implicit  scheme  converged  with  difficulty  at  the  highest  Reynolds  number,  despite  attempts  with  a 
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Table  3:  Viscous  Channel  Flow 


EXPLICIT 

r  B  4c  W  p 

Re# 

Accel. 

CPU(m) 

ITER 

CPU(ra) 

ITER 

CEES1 

99 

493 

66 

109 

75 

298 

HH 

219 

1678 

71 

118 

- 

- 

DBB&I 

MG  4c  RS 

366 

1385 

69 

85 

142 

421 

HH 

no  acc. 

574 

3345 

98 

121 

- 

- 

RS 

1  ' 

mmm 

136 

MM 

550 

1500 

m 

no  acc. 

m 

232 

m 

- 

! 

variety  of  parameter  values.  The  grid  stretching  at  high  Reynolds  numbers  renders  the  explicit  solver 
increasingly  stiff  and  decreases  the  effectiveness  of  multigrid.  The  Reynolds  number  10s  case  converged 
only  without  multigrid. 


6  Conclusions 

A  semi-implicit  algorithm  for  solving  the  thin  layer  Navier-Stokes  equations  in  finite  volume  form 
is  presented.  The  method  retains  much  of  the  flexibility  of  explicit  schemes  while  eliminating  the 
numerical  stiffness  due  to  the  disparate  physical  scales  found  in  typical  viscous  calculations.  It  is 
applied  to  viscous  and  inviscid  flows  and  is  compared  to  a  fully  implicit  scheme  and  an  explicit  scheme 
equipped  with  multigrid  and  residual  smoothing.  While  the  explicit  scheme  is  slightly  more  efficient  for 
inviscid  solutions  on  coarse  grids,  the  semi-implicit  algorithm  is  up  to  5  times  more  efficient  than  the 
explicit  and  implicit  schemes  on  the  computed  viscous  cases. 

The  algorithm  is  easy  to  implement  on  vector  and  parallel  architecture  machines,  and  preliminary 
calculations  with  a  fully  vectorised  code  have  been  made  on  a  Cray  XMP.  An  attractive  possibility 
is  to  use  an  explicit  solver  in  the  outer  inviscid  flow  coupled  to  SINSS  in  the  viscous  layer.  Future 
computations  will  focus  on  turbulent  flows  on  highly  stretched  grids  and  extension  to  three  dimensions. 
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1  Abstract 

An  adaptive  finite  element  algorithm  for  solving  the 
steady  Euler  equations  is  presented.  The  algorithm  uses 
quadrilateral  elements  and  allows  embedded  meshes.  An 
analysis  showing  how  many  finite  volume  and  finite  differ¬ 
ence  methods  can  be  viewed  as  finite  element  methods  is 
presented,  and  the  methods  are  compared.  A  higher-order 
approximation  based  on  biquadratic  interpolation  is  intro¬ 
duced,  and  its  usefulness  is  demonstrated.  Examples  includ¬ 
ing  channel  flow  and  flow  in  a  scramjet  inlet  demonstrate  the 
utility  of  unstructured  grids,  adaptation,  and  higher-order 
elements. 

2  Introduction 

Numerical  solution  of  the  Euler  equations  describing  the 
dynamics  of  an  inviscid,  compressible,  ideal  gas  are  becom¬ 
ing  an  important  tool  for  the  practicing  aerodynamicist  [l|. 
Many  algorithms  have  been  proposed  for  the  solution  of 
the  Euler  equations  under  various  names  (cell-centered  fi¬ 
nite  volume,  node-centered  finite  volume,  finite  element  to 
name  but  a  few),  and  many  authors  argue  the  virtues  of 
each  approach.  We  believe  that  the  important  distinction 
is  not  finite  volume  vs.  finite  element  vs.  finite  difference, 
but  rather  structured  mesh  vs.  unstructured  mesh.  For  the 
remainder  of  this  paper,  we  will  refer  to  any  unstructured 
mesh  algorithm  as  a  finite  element  algorithm.  The  main 
advantage  of  finite  element  methods  is  geometric  flexibility. 
They  allow  complex  geometries  to  be  treated  in  a  straight¬ 
forward  manner,  and  allow  for  the  use  of  grid  adaptation. 

In  this  paper,  we  examine  some  of  the  differences  and 
similarities  of  various  formulations  of  the  finite  element 
method.  Section  3.1  describes  three  algorithms,  which  we 
call  Galerkin  [3, 3,4),  cell-vertex,  and  central  difference,  and 
shows  how  they  fit  into  the  finite  element  framework.  For 
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example,  on  grids  with  parallellogram  elements,  the  cen¬ 
tral  difference  algorithm  described  is  equivalent  to  Jame¬ 
son’s  cell-centered  finite  volume  method  [5].  The  cell- vertex 
method  [6,7]  gives  the  same  difference  stencil  as  a  node- 
based  finite  volume  scheme,  or  as  the  first  step  of  a  Ni 
Scheme  [8].  We  hope  this  will  reduce  the  confusion  in  fi¬ 
nite  element/volume/difference  nomenclature  and  put  the 
focus  on  the  algorithms. 

We  also  examine  the  use  of  higher-order  (biquadratic) 
elements.  We  have  developed  a  Galerkin  formulation  us¬ 
ing  biquadratic  interpolation  functions,  and  demonstrate  its 
usefulness  for  typical  problems.  A  mesh  of  biquadratic  ele¬ 
ments  requires  many  fewer  elements  in  smooth  regions  of  a 
Sow,  but  can  have  mild  problems  near  discontinuities.  We 
are  exploring  the  use  of  mixed  bilinear  and  biquadratic  ele¬ 
ments. 

Much  work  has  been  done  on  the  use  of  adaptive  grids, 
either  by  embedding  [4,9,10,11],  grid  regeneration  [12,13],  or 
grid  redistribution.  We  explore  further  the  use  of  adaptive 
gridding,  introducing  and  expanding  the  idea  of  directional 
embedding  a*  proposed  by  Kallinderis  and  Baron  [14]. 

Next,  we  show  some  examples  of  the  idea*  presented  here. 
The  basic  examples  we  use  are  the  flow  over  a  4%  circular 
arc  bump  at  Mach  1.4  and  the  flow  over  a  10%  circular 
arc  bump  at  Mach  0.68.  We  show  some  comparisons  be¬ 
tween  the  Galerkin,  cell- vertex  and  central  difference  meth¬ 
ods,  and  demonstrate  the  usefulness  of  the  biquadratic  ele¬ 
ment.  We  also  demonstrate  adaptive  gridding  for  a  scramjet 
inlet,  and  we  show  how  adaptation  can  reduce  a  problem’s 
sensitivity  to  a  poor  initial  grid.  Finally,  we  demonstrate 
directional  embedding. 


3  Solution  Algorithm 

In  this  study,  the  two-dimensional  Euler  equations  de¬ 
scribing  the  flow  of  an  inviscid,  compressible  fluid  are  con¬ 
sidered.  To  allow  the  capture  of  shock*  and  other  discon¬ 
tinuous  phenomena,  the  Euler  equations  are  written  in  con- 
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servativ*  vector  form  a* 


2.  each  function  i*  0  outeide  the  element; 
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where  e  ie  total  energy,  p  ie  preeeure,  p  ie  deneity,  u  and  v  are 
the  *  and  y  flow  velocitiee,  0  ie  a  vector  of  etate  variablee, 
f  and  (5  are  flux  vectors  in  the  t  and  y  direction*,  and  k  it 
the  total  enthalpy,  given  by  the  thermodynamic  relation 


k  =  e  +  (3) 

P 

In  addition,  one  require*  the  equation  of  state 

5 -(7-1)  [«-;(•*  +  •*)]  (<) 


where  the  specific  heat  ratio  7  is  taken  as  a  constant  (1.4) 
for  all  calculations  reported. 


3.1  Spatial  discretisation 

The  finite  element  approach  to  diicretising  these  equa¬ 
tions  divides  the  domain  into  elements  determined  by  some 
number  of  nodee  (in  the  current  implementation,  4  nodes 
make  up  a  bilinear  element,  and  9  nodes  make  up  a  bi¬ 
quadratic  element).  Thee*  elements  can  be  of  any  shape. 
The  shapes  most  commonly  used  are  triangles  and  quadri¬ 
laterals.  In  this  paper,  we  us*  only  quadrilateral  elements. 
A  quadrilateral  (or  hexahedral)  grid  will  require  fewer  el¬ 
ements  for  a  given  number  of  nodes  than  will  a  triangular 
(or  tetrahedral)  grid,  and  hence  will  be  less  expensive  to 
compute  and/or  require  less  memory.  Also,  quadrilateral 
elements  are  slightly  more  accurate  [IS].  However,  there  is 
nothing  inherent  in  the  formulation  of  the  algorithm  pre¬ 
sented  her*  that  is  limited  to  a  particular  element  topology. 

Within  each  element  the  state  vector  U and  flux  vectors 
F(,)  and  G{,)  ire  written 
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where  the  Cf,-'\.F,-*,(and G***  are  the  nodal  value*  of  the 
state  vector  in  element  s  and  the  Nj'*  are  a  set  of  interpo¬ 
lation  functions  on  that  element.  These  interpolation  func¬ 
tions  have  the  following  properties: 

1.  each  function  is  1  at  on*  nod*  and  0  at  all  other  node 
in  the  element, 


3.  the  sum  of  all  the  interpolation  functions  in  the  element 
it  1  everywhere  in  the  element. 


Polynomials  are  usually  used  for  the  interpolation  functions. 
These  polynomials  are  expressed  in  terms  of  local  coordi¬ 
nates  ((,n)>  which  are  related  to  (z,y)  by  an  isoparamet¬ 
ric  transformation.  Thus,  inherent  in  the  formulation  that 
follows  are  tome  transformational  metrics,  which  are  not 
shown  for  clarity. 


These  expressions  can  be  differentiated  to  obtain  an  ex¬ 
pression  for  the  derivative  in  each  element  in  terms  of  the 
nodal  value*  (shewn  here  for  the  state  vectors) 


dx*  dzi  1 
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where  NNE  is  the  number  of  nodes  in  that  element.  The 
flux  vector  derivative*  are  calculated  the  same  way. 


The  expression  for  the  derivative*  is  substituted  into 
equation  (2)  and  summed  over  all  elements  to  obtain 
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where  rf  is  now  a  global  vector  of  interpolation  functions, 
determined  by  summing  the  interpolation  functions  for  each 
element. 


The  next  step  can  be  thought  of  a*  a  projection  onto  the 
space  spanned  by  some  other  functions  N',  called  test  func¬ 
tions  such  that  the  error  in  the  discretisation  is  orthogonal 
to  the  space  spanned  by  the  test  functions.  For  more  detail 
on  the  mathematics  involved  see  [16].  To  do  this,  multiply 
Eq.  (9)  by  and  integrate  over  the  entire  domain.  This 
results  in  the  semi-discrete  equation 

M  =  j  j  (11) 

or 

a  rj 

=  -Rnfi  -  R,$i  (12) 

where  M  is  the  consistent  mass  matrix  and  R,  and  R, 
are  what  we  call  residual  matrices.  The  mas*  matrix  M 
is  sparse,  symmetric,  and  positive  definite,  but  not  struc¬ 
tured,  so  it  is  replaced  by  a  lumped  (diagonal)  mass  matrix 
Mi  in  which  each  diagonal  entry  is  the  sum  of  all  the  ele¬ 
ments  in  the  corresponding  row  of  M.  This  allows  Eq.  12 
to  be  solved  explicitly.  The  lumping  does  not  change  the 
steady-state  solution,  but  does  modify  the  time  behavior  of 
the  algorithm. 
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3.1.1  Choice  of  Test  Function* 

Various  choice*  for  N'  ere  po**ible,  each  giving  riee  to 
a  particular  discretisation.  If  one  choee*  each  S\ ,)  to  be 
the  corresponding  1VJ*\  one  obtain*  the  Galerkin  Finite  El¬ 
ement  approximation  (4).  Thi*  approximation  ha*  several 
interesting  feature*.  First,  it  give*  the  minimum  steady- 
state  error,  since  there  i*  no  component  of  that  error  in  the 
space  of  the  interpolation  functions. 

Second,  for  bilinear  functions,  on  a  uniform  mesh,  for  the 
Euler  equation*,  in  steady-state,  it  i*  a  fourth-order  accurate 
approximation.  Finally,  on  certain  mesh  topologies,  it  can 
be  viewed  a*  a  finite  volume  approximation,  where  the  line 
integral  i*  taken  around  a  set  of  4  elements,  assuming  that 
the  integrand  varies  parabolicaily  along  each  edge. 

If  on*  choose*  each  N'^ *’  to  be  a  constant,  on  bilin¬ 
ear  meshes  one  obtain*  the  ‘cell-vertex*  approximation  [7[. 
This  approximation  is  100%  identical  to  a  node- based  finite 
volume  method. 

Finally,  if  one  chooses  eacn  N?  *>  to  be  a  Dirac  Delta 
function  at  node  s',  one  obtains  the  central  difference  or  col¬ 
location  approximation.  For  the  bilinear  case  on  a  mesh  of 
parallellograms,  this  is  identical  to  a  cell-based  finite  vol¬ 
ume  method.  If  the  mesh  is  not  a  mesh  of  parallellograms, 
thi*  equivalence  doe*  not  hold.  The  practical  difference* 
between  these  methods  will  be  discussed  in  section  4. 


3.2  Wall  boundary  condition 

At  walla,  the  portion*  of  the  flux  vectors  representing  con¬ 
vection  normal  to  the  wall  are  set  to  zero  to  enforce  the  Sow 
tangency  condition.  The  equation  for  the  fluxes  is  then 
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where  um  and  are  corrected  velocities  such  that  the  total 
convective  contribution  normal  to  the  wall  will  be  0.  These 
velocities  at  each  nod*  are 

«M  =  »(*.-  nj)  -  vntn,  (14) 

vm  =  v(l  -  nj)  -  un.s,  (15) 

where  n,  and  n,  are  the  components  of  the  unit  normal. 

There  is  an  option  in  the  cods  either  to  enforce  flow  tan¬ 
gency  after  each  iteration,  or  to  allow  thi*  condition  to  be 
reached  only  in  eteady  state.  If  flow  tangency  is  enforced, 
convergence  rate  are  improved  slightly,  and  the  robustness 
is  improved  greatly.  For  example,  without  enforcing  flow 
tangency,  we  were  limited  to  free  stream  Mach  numbers  of 


about  5.5,  but  with  flow  tangency  enforced  explicitly,  we 
have  successfully  run  problems  with  free  stream  Mach  num¬ 
bers  as  high  as  Mach  8.  We  enforce  Sow  tangency  in  most 
case*. 


3.3  Far-field  boundary  condition 


A  one-dimensional  characteristic  treatment  is  used  on  the 
far  field  boundary.  From  the  inward-directed  unit  normal 
vector  A,  the  unit  tangent  vector  t  and  the  normal  and  tan¬ 
gential  velocities  u*  and  u<  are  calculated.  The  1-D  Rie- 
mann  invariant*  (and  the  corresponding  wav*  speeds)  are 


invariants: 
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If  there  is  no  entropy  variation  normal  to  the  boundary, 
these  invariants  are  exact,  otherwise  they  are  approximate. 
At  each  point  on  the  boundary,  the  invariants  are  calculated 
using  the  solution  state  vector  0  and  the  free  stream  state 
vector  !?<*.  Then  a  decision  is  made  based  on  the  sign  of  the 
corresponding  wav*  speed  on  whether  to  use  the  invariant 
based  on  the  current  state  or  the  invariant  based  on  the 
free  stream.  If  the  relevant  wave  speed  is  positive  (ex.,  in 
supersonic  inflow,  all  4  characteristic  speed*  are  positive) 
then  the  free  stream  value  is  used. 


The  invariants  are  transformed  back  into  a  temporary  set 
of  primitive  variables,  and  these  primitive  variables  are  used 
to  calculate  the  fluxes  for  use  in  the  residual  calculation.  An 
alternative  is  to  update  the  state  vector  baaed  on  the  new 
invariants  at  this  time.  Updating  the  state  vectors  each 
time  the  boundary  condition  is  calculated  seems  to  improve 
the  robustness  of  the  method,  especially  for  biquadratic  el¬ 
ements. 


3.4  Smoothing 

To  capture  shocks  and  stabilize  the  scheme,  an  artificial 
viscosity  needs  to  be  added.  Currently,  the  smoothing  used 
consists  of  a  fourth-difference  term  and  a  pressure-switched 
second-difference  term  similar  to  that  discussed  by  Rizzi 
and  Eriksson  [17],  Due  to  the  unstructured  nature  of  the 
grids,  a  Laplacian-typ*  of  second-difference  is  used,  instead 
of  normal  and  tangential  or  {  and  f>  differences. 

The  heart  of  the  smoothing  method  is  the  calculation  of 
an  elemental  contribution  to  a  second  difference.  There  are 
two  ways  we  have  explored  for  doing  this.  The  first  method, 
suggested  by  Ni  [8],  is  relatively  fast,  conservative  and  ro¬ 
bust  (it  is  dissipative  on  any  element  geometry),  but  gives 
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Figure  1:  Weight*  for  Second  Difference  at  Node  1 

a  non-iero  contribution  to  the  tecond  difference  for  a  linear 
function  on  a  non-uniform  grid.  The  second  method,  pro¬ 
posed  by  Mavriplis[l8]  is  more  expensive,  not  conservative, 
less  robust  (it  can  be  anti-diseipative  if  the  element  is  not 
convex),  but  always  results  in  sero  contribution  from  linear 
functicns  on  non-uniform  grids. 

3.4.1  Calculation  of  a  Conservative,  Low-Accuracy  Second 
Difference 

Figure  1  shows  the  contribution  of  a  typical  element  to 
the  second  difference  at  node  1.  The  numbers  inside  the 
box  are  the  node  numbers,  the  numbers  outside  are  the 
weights.  The  elemental  contribution  to  a  node  is  obtained 
by  subtracting  the  value  at  the  node  from  the  average  value 
in  the  element.  The  elemental  contributions  are  summed 
to  give  the  second  difference  at  the  node.  The  elemental 
contributions  can  also  be  multiplied  by  a  scale  factor  before 
being  summed  to  the  node.  This  is  how  the  switched  second 
difference  is  calculated. 


3.4.2  Calculation  of  a  Non-Conservative,  High-Accuracy 
Second  Difference 

This  method  divides  the  element  into  4  triangles.  Fig¬ 
ure  2  shows  how  the  element  is  divided.  The  figure  shows 
the  element  in  dashed  lines,  with  the  triangle  outlined  in 
the  solid  line.  The  first  derivatives  are  calculated  by  a  line 
integral  around  the  triangle.  Triangle  integration  :s  used 
because  the  stencil  that  result*  from  the  integration  around 
the  entire  quadrilateral  doe*  not  damp  the  double-sawtooth 
eigenmode  of  the  residual  operator.  This  derivative  is  in¬ 
tegrated  again  around  an  appropriate  polygon  to  get  the 


second  difference.  The  polygon  and  the  integration  direc¬ 
tion  are  shown  in  Fig.  3  for  a  node  in  the  interior  and  a  node 
on  the  boundary.  Note  that  each  dashed  box  represents  an 
element. 

At  node  1,  for  example,  the  contribution  from  an  interior 
element  is 

U„  +  u„  =  ~  y*)  +  Utiv*  ~  >t) 

+  t/*(y,  —  ya)) 

+  (17) 

+  —  *,)  +  Ua(z  i  —  x»)\ 

where  A  is  the  area  of  triangle  1-2-4,  and  (x,-,  y<)  are  the  co¬ 
ordinates  of  the  tth  node.  For  an  element  on  the  boundary, 
the  term  in  front,  corresponding  to  the  second  integration 
is  changed.  For  example,  for  an  element  with  the  1-2  face 
on  a  boundary,  the  term  in  front  would  be  x«  —  Z|  instead 
of  x*  —  xj,  corresponding  to  an  integration  around  two  sides 
of  the  triangle  instead  of  one.  Note  that  only  one  factor  of 
A  is  used,  since  we  want  a  second  difference,  not  a  second 
derivative. 

3.4.3  Combined  smoothing 

To  calculate  the  complete  smoothing  for  a  time  step,  we 
first  calculate  the  nodal  second  difference  of  pressure  by  ei¬ 
ther  of  the  2  methods.  This  is  turned  into  an  elemental 
quantity  by  simple  averaging.  The  elemental  second  differ¬ 
ence  is  then  normalised  by  an  elemental  pressure  average  to 
form  an  elemental  switch.  The  second-difference  smoothing 
term  is  the  weighted  second  difference  using  the  first  method 
above,  multiplied  by  a  constant  between  0  and  0.05.  The 
fourth-difference  smoothing  term  is  the  second  difference 
(by  the  first  method)  of  the  second  difference  (by  either 
method)  multiplied  by  a  constant  between  0.001  and  0.05. 
The  sum  of  these  two  terms  is  added  directly  into  the  time 
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Figure  3:  Integration  Polygon*  for  Smoothing  Calculation 


integration  of  Eq.  18.  The  «moothing  ia  globally  conserva¬ 
tive,  i.e.,  the  total  contribution  over  the  entire  domain  i* 
sero.  This  can  result  in  a  convective  character  near  bound¬ 
aries,  but  this  does  not  affect  the  solutions  adversely.  The 
smoothing  of  section  3.4.3  is  used  in  all  test  case*  following 
except  for  the  scramjet  calculations,  which  use  the  smooth¬ 
ing  of  Section  3.4.1. 

The  choice  of  smoothing  method  is  of  great  importance. 
Calculation*  done  by  Lindquist  and  Giles  [19]  indicate  that 
the  accuracy  of  a  complete  method  for  solving  the  Euler 
equations  can  depend  more  on  the  smoothing  than  on  the 
basic  difference  algorithm.  We  have  done  tests  and  have 
confirmed  this.  For  instance,  the  Galerkin  and  cell-vertex 
methods  are  second  order  accurate  when  the  high  accuracy 
smoothing  is  used,  and  both  are  first  order  accurate  when 
the  low  order  smoothing  is  used.  This  indicates  to  us  that 
there  is  a  need  for  further  study  of  artificial  viscosity  models. 


3.S  Time  Integration 

To  integrate  equation  (13),  the  following  multi-step 
method  is  used: 
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for  node  t,  and  A  is  the  CFL  number.  Local  time  stepping 
is  used  to  accelerate  convergence,  with  the  time  step  given 
by 


A  f,= 


Ax,- 


|u|  +  a 


(19) 


where  Ax,  is  some  nodal  characteristic  length,  and  u  is  the 
flow  velocity  at  the  node.  In  our  algorithm,  for  Ax,-  we  use 
the  minimum  (over  all  elements  containing  the  node)  of  the 
average  lengths  of  opposite  tide*  of  the  element. 


In  one  dimension,  for  stability  A  must  be  less  than  2^2- 
In  two  dimensions,  a  linear  stability  analysis  of  the  wave 
equation  on  a  uniform  mesh  gives  the  following  stability 
limits:  A  mutt  be  less  than  1.93,  for  the  Galerkin  method, 
2.17,  for  the  cell-vertex  method,  and  1.41  for  the  central 
difference  method.  In  practice,  this  estimate  is  a  very  con¬ 
servative  one,  since  the  Ax,  we  calculate  is  usually  smaller 
than  the  characteristic  length  limiting  the  stability.  In  the 
current  implementation,  the  smoothing  is  computed  at  the 
first  stage  and  is  "frosen*  for  remaining  stages. 


4  Comparison  of  the  Methods 

This  section  compares  the  various  methods  (Galerkin, 
cell-vertex,  and  central  difference)  and  discusses  some  of  the 
important  differences  between  them.  An  extensive  analytic 
comparison  of  the  methods  will  be  presented  in  [30],  so  we 
will  only  present  the  important  points  here. 


4.1  Analytic  Comparison 

A  significant  source  of  error  in  the  solution  of  flows  with 
supersonic  regions  is  the  dispersive  error  associated  with 
the  discrete  system  of  equations.  This  error  appears  as  low- 
frequency  oscillations  either  ahead  or  behind  shocks,  de¬ 
pending  on  Mach  number  and  element  aspect  ratio.  For 
the  grids  used  in  the  problems,  these  oscillations  should  be 
ahead  of  the  shock  for  the  cell-vertex  scheme  where  the  lo¬ 
cal  Mach  number  is  greater  than  \f%,  and  for  the  central 
difference  and  Galerkin  methods  when  the  Mach  number 
is  less  than  %/2.  Conversely,  the  oscillations  should  be  be¬ 
hind  the  shock  for  the  cell-vertex  scheme  if  the  local  Mach 
number  is  less  than  \/2,  and  for  the  central  difference  and 
Galerkin  methods  where  the  Mach  number  is  greater  than 
\/2.  These  effect*  are  most  apparent  when  the  artificial 
damping  is  small.  For  most  practical  problems,  the  damp¬ 
ing  is  large  enough  to  swamp  out  these  error*. 


4.3  Numerical  Comparison 


To  see  the  practical  differences  between  the  methods,  two 
test  problems  were  run  with  each  of  the  3  methods  on  the 
same  grid.  The  test  case*  were  Mach  0.68  Sow  in  a  channel 
over  a  10%  circular  arc  bump  on  a  60x20  grid  and  Mach  1.4 


where  Ri(U)  is  the  right-hand  side  of  Eq.  (12)  with  the 
fluxes  based  on  state  vector  U,  V<  is  a  smoothing  term  (de¬ 
scribed  above),  Mu  is  the  entry  in  the  lumped  mas*  matrix 
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flow  in  a  channel  over  a  4%  circular  arc  bump  on  60x20  and 
90x30  gride. 

In  the  transonic  case,  there  ie  very  little  difference  in  the 
solution*.  Very  near  the  shock,  there  ie  a  small  amount  of 
dispersive  error,  but  this  Is  rapidly  killed  off  because  the 
Sow  is  supersonic  only  over  a  small  region.  Figure  6  shows 
the  Mach  number  along  the  bump  for  the  three  methods. 

In  the  supersonic  case,  coarse  mesh,  the  three  methods 
exhibit  different  behavior  on  the  surface.  Figures  7-9  show 
the  surface  Mach  number*  for  the  coarse  grid  case.  The  so¬ 
lutions  are  quite  different  (note  the  plateau-like  feature  after 
the  leading-edge  shock  for  the  cell- vertex  case)  and  not  very 
good.  Interestingly,  the  difference  between  the  methods  is 
much  smaller  at  mid-channel.  Figure  10  shows  that  in  mid¬ 
dle  of  the  channel,  the  solutions  are  almost  identical. 

The  differences  for  the  supersonic  case  also  are  reduced 
with  increasing  mesh  refinement.  Figure  11  shows  that  for 
the  90x30  grid,  the  surface  Mach  numbers  are  almost  iden¬ 
tical  again.  This  illustrates  two  important  points.  First,  it 
is  important  to  have  enough  reeolution,  and  second,  if  one 
has  enough  resolution,  there  is  not  a  great  deal  of  difference 
between  the  methods  for  these  problems. 

One  important  characteristic  of  each  method  is  the  CPU 
time  required  to  calculate  the  residual.  If  one  normalises  the 
CPU  time  required  to  calculate  everything  but  the  residual 
(fluxes,  smoothing,  boundary  conditions  and  updating)  to  1 
unit,  the  Galerkin  method  requires  .8  units,  the  cell-vertex 
.5  units,  and  the  central  difference  .9  units  for  our  imple¬ 
mentation.  Therefore,  the  total  time  for  a  problem  can 
rang*  from  1.5  to  1.9  unite.  Of  course,  the  CPU  timings 
can  vary  depending  on  the  exact  details  of  the  implemen¬ 
tation.  We  feel  that  we  have  implemented  the  methods  in 
the  most  efficient  manner.  It  should  also  be  noted  that  the 
stability  limits  for  the  Galerkin  and  cell-vertex  methods  sure 
much  better  than  the  limit  for  the  central  difference  method. 
When  all  factors  are  taken  into  account,  the  Galerkin  and 
cell-vertex  methods  seem  to  be  pretty  close  to  each  other, 
with  the  central  difference  method  a  not-too-close  third. 


5  Biquadratic  Elements 

In  many  problems,  the  flow  may  be  quite  smooth  over 
large  portions  of  the  domain.  Such  .problem*  may  bene¬ 
fit  from  the  use  of  higher-order  elements.  One  element  we 
have  been  working  with  is  a  subparametric  ',  biquadratic 
element.  Figure  4  shows  the  element  nod*  numbering.  The 
element  is  subparametric  because  the  position*  of  nodes  5-9 
are  functions  of  the  positions  of  nodes  1-4  (midside  nodes 
are  at  the  midpoint  of  the  side,  and  the  center  node  is  at 


*A  tubpertmetric  element  It  on*  In  which  the  feometry  it 
interpolated  with  •  polynomial  of  lower  decree  than  the  ttate 
vector*  or  flux** 
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Figure  4:  Biquadratic  Element  Geometry 

the  "average”  position  of  nodes  1-4).  The  subparametric  re¬ 
striction  allows  one  to  describe  the  element  with  0  geometric 
parameters,  instead  of  16  for  the  full  biquadratic  element. 

Element  residual  calculation  is  performed  exactly  as  de¬ 
scribed  above  for  the  bilinear  elements.  W*  have  experi¬ 
mented  with  several  ways  to  do  the  smoothing.  Most  of 
our  efforts  have  met  with  failure.  Our  current  method  for 
smoothing  calculation  is  to  treat  each  biquadratic  element 
as  four  bilinear  elements,  and  to  us*  the  method  of  Sec¬ 
tion  3.4.1  for  calculating  the  second  difference*.  W*  plan  on 
implementing  the  method  of  Section  3.4.2  but  have  not  yet 
done  so. 

The  stability  limit  cannot  be  computed  with  the  standard 
Fourier  methods,  as  the  difference  stencil  is  different  at  cor¬ 
ner  nodes,  midside  nodes  and  the  center  node.  Numerical 
experiments  indicate  that  the  CFL  limit  for  the  Galerkin 
method  is  about  0.8.  In  the  case  of  the  biquadratic  ele¬ 
ments,  the  cell-vertex  and  central  difference  methods  seem 
to  be  unstable. 

Figures  13  and  15  show  the  pressure  contours  for  the  cir¬ 
cular  arc  bump  problems  discussed  above.  The  biquadratic 
calculations  were  done  on  a  24x8  grid  (Fig.  12).  Note  that 
the  shocks  in  both  cases  are  captured  well.  The  surface 
pressure  plot*  (  Figs.  14  and  16)  show  that  the  biquadratic 
elements  have  larger  overshoots  at  the  shocks,  as  expected. 
The  surface  plot*  also  indicate  the  presence  of  some  spuri¬ 
ous  high  frequency  oscillations.  The  artificial  viscosity  for 
the  biquadratic  elements  is  still  an  area  for  research. 

The  supersonic  case  required  272  iterations  and  1.2  min- 
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utes  of  CPU  time  on  an  Alliant  FX/8  computer  with  3  CE’e. 
For  companion,  a  1200  element  bilinear  Galerkin  caee  (com¬ 
parable  accuracy)  r equine  about  370  iteration!  and  2.3  min- 
utee  of  CPU  time.  The  transonic  can  converged  in  725  iter¬ 
ations  and  used  3  CPU  minutes  on  the  same  machine.  For 
comparison,  a  1200  element  bilinear  ease  took  5  CPU  min¬ 
utes.  This  program  is  fully  vector-concurrent  on  the  Alliant 
FX/8,  demonstrating  the  suitability  of  the  algorithm  for 
high  performance  supercomputers.  Due  to  the  unstructed 
mesh,  gather-scatter  performance  is  a  major  architectural 
concern. 


6  Adaptation 

In  order  to  make  the  beet  use  of  computational  resources, 
it  is  desirable  to  put  the  points  where  the  How  has  interest¬ 
ing  features.  This  involves  detecting  the  feature,  and  decid¬ 
ing  how  to  adapt.  The  details  of  managing  the  grid  data 
structure  during  adaptation  are  too  involved  to  be  discussed 
here. 


0.1  Adaptation  Procedure 

In  a  typical  problem,  one  starts  out  with  an  idea  of  what 
the  flow  will  look  like,  but  one  may  not  know  exactly  where 
the  features  lie.  The  adaptive  approach  starts  with  a  fairly 
coarse  initial  grid,  coarse  enough  to  be  cheap  to  compute, 
yet  fine  enough  so  that  most  of  the  essential  features  can  ap¬ 
pear.  The  first  step  is  to  compute  a  solution  on  the  coarse 
grid.  It  is  usually  not  necessary  to  run  this  solution  to  con¬ 
vergence,  and  we  usually  run  it  “halfway*  to  convergence, 
that  is,  until  the  RMS  change  is  the  square  root  of  the  con¬ 
verged  value.  Then,  an  adaptation  parameter  is  calculated, 
and  cells  are  flagged  either  for  refinement  or  unreflnement. 
The  flagged  cells  at  the  coarsest  level  are  divided,  then  those 
at  the  next  coarsest,  etc.  After  all  division  is  done,  unrefine¬ 
ment  proceeds  from  the  finest  level  to  the  coarsest.  After 
all  unreflnement  has  been  done,  the  grid  geometry  is  recal¬ 
culated,  along  with  some  quantities  used  for  vectorisation. 
The  solution  is  interpolated  onto  the  new  gild,  and  the  cal¬ 
culation  proceeds.  In  a  typical  problem,  an  adaptation  take 
about  as  much  time  as  an  2  or  3  iterations,  but  since  adapta¬ 
tion  occurs  infrequently,  the  time  involved  is  not  significant. 

0.3  Adaptation  Criteria 

There  is  a  good  amount  of  literature  about  the  beet  choice 
of  adaptation  parameter.  Dannenhoffer  and  Baron  [9]  sug¬ 
gest  using  the  first  difference  of  density  as  the  criterion  for 
adaptation.  Lohner  (lOj  suggests  using  a  more  complex  in¬ 
dicator,  which  is  essentially  a  second  difference  of  density 
scaled  by  a  first  difference  of  density.  Berger,  et.  al. [2 1]  use 
Richardson  extrapolation  to  eetimate  the  errors.  We  have 
found  that  the  first  difference  of  density  tends  to  give  better 


grids  for  transonic  flows,  while  the  scaled  second  difference 
works  better  for  flows  with  many  shocks  at  high  Mach  num¬ 
ber.  This  bears  out  the  experiences  of  these  authors. 

Our  first-difference  indicator  (used  for  transonic  flows)  is 
calculated  as  follows: 

1.  In  each  element,  calculate  the  abeolute  value  of  the  first 
difference  of  density  (the  exact  details  are  not  critical, 
as  long  as  all  elements  are  treated  identically.) 

2.  Compute  the  mean  and  standard  deviation  of  this 
quantity. 

3.  Normalise  this  quantity  by  subtracting  the  mean  and 
dividing  by  the  standard  deviation.  If  the  standard 
deviation  is  small,  use  some  arbitrary  value  instead 
(currently  0.05  times  the  mean  or  .0005,  whichever  is 
larger.) 

4.  Compute  the  median,  and  subtract  it  from  the  param¬ 
eter. 

5.  If  the  scaled  parameter  is  greater  than  some  threshold 
value,  try  to  refine  that  element.  If  it  less  than  another 
threshold  value,  try  to  collapse  the  element. 

We  have  found  that  refine  thresholds  near  .3  and  collapse 
thresholds  near  -.3  give  good  results.  The  shifting  by  the 
median  Is  performed  after  the  normalisation  because  it  is 
much  easier  to  calculate  the  median  of  a  quantity  when  has 
a  better  idea  of  the  range.  The  shifting  is  necessary  because 
the  distribution  of  the  adaptation  parameter  can  become 
very  skewed  as  the  calculation  progresses. 

Our  second-difference  based  parameter  is  computed  as 
follows: 

1.  Compute  a  nodal  second  difference  of  density  using  one 
of  the  methods  outlined  above. 

2.  Compute  a  nodal  first  difference.  Again,  the  details  are 
not  important. 

3.  Compute  an  average  density  at  the  nodes. 

4.  Compute  a  nodal  adaptation  switch: 

Isecond  difference! 

Switch  =  75 — -  - ; -  (20) 

| first  difference!  +  t  x  average 

where  e  is  a  small  parameter  to  “smooth”  the  switch 
in  smooth  portions  of  the  flow.  Without  it,  small  oscil¬ 
lations  in  the  Sow  tend  to  produce  large  values  of  the 
switch. 

5.  If  the  switch  is  greater  than  some  value  (usually  around 
0.15),  divide,  if  it  is  less  than  some  value  (usually 
around  0.05)  undivide. 
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0.3  Directional  Adaptation 


In  many  case*,  the  features  of  intaraat  have  strong  varia¬ 
tion*  in  only  ona  direction.  In  caaaa  like  the*e,  it  ia  waste- 
ful  to  introduce  pointa  la  the  direction  along  which  little 
variation  occur*.  This  load*  to  tha  concept  of  directional 
adaptation.  The  original  motivation  in  [14]  wae  to  uaa  thia 
idea  for  viecou*  flow*  with  boundary  layer*.  We  have  tried 
to  extend  these  idea*  to  inviseid  flow*.  The  main  difference 
ia  that  in  the  boundary  layer*,  the  rapid  variation*  tend  to 
be  along  grid  line*,  while  in  the  inviscid  flow*,  the  feature* 
(mainly  fhock*)  can  lie  in  any  direction.  We  have  developed 
a  criterion  for  determining  which  way  to  divide  a  cell.  The 
first  step  i*  to  use  some  criterion  for  determining  which  cell* 
to  subdivide.  After  thia  i*  done,  determine  whether  to  di¬ 
vide  into  4  cell*  or  two  cell*  with  the  following  procedure. 
Calculate 


-  \p*  -  Pi\  +  \ps  - 

-  Aa| 

(21) 

Av 

=  \rn  -  Pi|  +  |Ps  * 

-/>*! 

(22) 

Dh 

Ah 

Ah  +  Av 

(23) 

Dv 

Av 

Ah  +  Av 

(24) 

If  Dm  i*  greater  than  some  threshold  (typically  .2-.4),  di¬ 
vide  the  cell  horisontally  (parallel  to  the  1-2  face).  If  Dv  is 
greater  then  some  threshold,  divide  the  cell  vertically  (par¬ 
allel  to  the  2-3  face).  If  both  exceed  their  threshold*,  or 
neither  exceeds  its  threshold,  divide  the  cell  into  4  subcella 
za  L:  regular  embedding. 

0.4  Embedded  Interface  Treatment 

An  important  facet  of  the  algorithm  is  the  treatment  of 
interfaces  between  coarse  and  fine  region*  of  the  grid.  Fig¬ 
ure  5  shows  a  typical  interface  between  a  locally  fine  region 
and  a  coarser  region.  The  fluxes  at  the  interface  node  (node 
2)  are  set  to  the  average  of  the  fluxes  at  nodes  1  and  3  before 
residuals  are  calculated.  The  state  vector  at  node  2  is  set  to 
the  average  of  the  state  vectors  at  node*  1  and  3  after  each 
iteration.  No  other  special  treatment  is  applied  at  nodes 
1,2,  or  3.  Thus,  the  overhead  involved  in  having  interface* 
is  extremely  small,  amounting  to  less  than  2%  of  the  total 
time  per  iteration  for  a  typical  mesh.  This  step  is  also  fully 
vectorised,  so  it  will  not  be  a  limitation  on  a  faster  machine. 

O.S  Examples 

To  illustrate  these  ideas,  we  show  how  adaptation  can  be 
used  to  improve  the  solution  on  an  intentionally  distorted 
grid.  We  then  demonstrate  directional  adaptation.  Finally, 
we  show  how  different  problems  can  produce  very  different 
final  grids  from  the  same  initial  grid  by  showing  the  adapted 
results  from  two  scramjet  inlet  calculations,  one  at  Mach  S, 
and  on*  at  Mach  2.8$. 
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Figure  5:  Detail  of  embedded  interface 

An  advantage  of  adaptation  is  that  it  reduces  a  problem’s 
sensitivity  to  the  grid.  Figure  17  shows  a  distorted  grid  for 
the  10%  bump  problem.  There  are  places  in  the  grid  where 
the  elements  nearly  degenerate  into  triangles.  After  running 
with  4  adaptations,  and  a  total  of  3  different  mesh  levels,  the 
final  grid  is  shown  in  Fig.  18.  Figures  19  and  20  show  th* 
pressure  contours  and  surface  pressured  on  the  final  grid. 
These  compare  well  to  other  solutions  on  smoother  meshes. 

To  demonstrate  directional  adaptation,  we  used  as  the 
test  example  the  transonic  circular  arc  bump.  Th*  first- 
difference  adaptation  criterion  was  used,  th*  vertical  thresh¬ 
old  was  set  to  0.8,  and  the  horisontal  threshold  was  set  to 
0.8.  The  Anal  grid  is  shown  in  Fig.  21,  and  contours  of  pres¬ 
sure  in  Figure  22.  We  found  that  all  of  the  directional  em¬ 
bedding  results  were  disappointing.  The  savings  introduced 
by  directionally  adapting  are  small,  and  the  degradation  of 
solution  quality  is  large.  The  key  issue  seem*  to  be  that  in 
order  for  directional  adaptation  to  work  well,  th*  grids  and 
features  must  be  aligned.  This  is  the  case  in  boundary  lay¬ 
ers,  and  has  been  applied  there  successfully  by  Kallinderis 
and  Baron[14|. 

To  show  the  efficiency  gain  from  adaptation,  we  computed 
flow  through  a  scramjet  inlet  (Fig.  23)  from  a  paper  by 
Kumar  [22].  This  grid  took  under  2  hours  of  human  time  to 
generate  by  the  first  author,  illustrating  the  utility  of  finite 
eleL-nt  methods.  In  addition,  no  programming  changes 
were  required  for  these  case*.  Exactly  the  same  program 
was  used  to  run  the  channel  flows  and  th*  scramjet*.  If  we 
desired  to  run  a  three  strut  inlet,  for  example,  we  would  only 
need  to  generate  the  appropriate  initial  grid.  The  initial  grid 
for  the  scramjet  is  shown  in  Fig.  23.  We  ran  2  cases,  one  at 
Mach  $.0  and  another  at  Mach  2.85.  The  Mach  2.85  case 
was  chosen  because  the  flow  in  the  center  passage  is  just 
about  to  choke.  If  the  Mach  number  is  reduced  further,  the 
inner  flow  chokes  and  th*  inlet  can  unstart  under  certain 
outflow  conditions.  These  cases  were  run  using  the  ceil- 
vertex  method. 

Figure  24  shows  th*  final  grid  for  th*  Mach  5.0  test 
case.  The  second-difference  based  adaptation  parameter 
was  used,  and  th*  grid  was  allowed  to  adapt  5  times.  We 
limited  the  grid  to  3  levels  of  embedding,  mainly  because 
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of  limitation*  in  the  printer  that  produced  the  result*.  At 
the  ecale  in  this  paper,  anything  finer  would  appear  a*  *olid 
black.  We  have  run  on#  teet  ca*«  where  we  allowed  4  lev¬ 
els  of  embedding.  Figure  25  *how«  the  density  contours  in 
the  inlet.  Not*  the  clarity  of  th#  shock*  and  expansion  fans, 
and  note  the  bending  of  the  shocks  as  they  pass  through  the 
expansions.  This  is  an  illustration  of  a  problem  that  adapts 
itself  almost  everywhere.  A  globally  fine  grid  for  this  prob¬ 
lem  would  have  about  5500  elements,  as  opposed  to  4000  for 
the  adapted  grid.  The  main  advantage  of  the  adaptation  in 
this  problem  is  that  the  solution  after  each  refinement  starts 
out  with  a  guess  that  is  closer  to  the  final  answer,  so  the 
solution  requires  fewer  iteration*  to  converge. 

The  next  case  was  a  nearly  choked  inlet.  Figure  26  shows 
the  final  grid  for  this  case.  The  Mach  number  contour*  are 
shown  in  Fig.  27.  Notice  the  pattern  of  shocks  just  in  front 
of  the  throat.  There  is  a  very  small  normal  shock  right  at 
mid-channel.  This  is  more  apparent  in  the  throat  close-up, 
where  a  distinct  subsonic  region  exists.  The  minimum  Mach 
number  is  about  0.96  in  that  region.  As  the  inflow  Mach 
number  is  decreased,  the  subsonic  region  increases  in  size 
until  the  flow  chokes  and  the  solution  fails.  Fig.  28.  Also 
note  ths  difference  compared  with  the  Mach  5  case  in  shock 
pattern  at  the  trailing  edge  and  outflow  boundaries.  Each 
of  these  cases  required  about  12  minutes  of  CPU  time  on 
the  Alii  ant. 

Finally,  we  ran  the  Mach  5.0  inlet  on  a  grid  of  the 
same  resolution  as  the  initial  grid  for  th*  cases  just  shown 
(Fig.  23),  but  with  biquadratic  elements.  Figure  29  show* 
the  density  contours  for  this  case.  The  flow  feature#  are 
only  slightly  less  resolved  than  the  bilinear  case  (Fig.  25), 
but  this  case  took  only  1.5  minutes  to  run  on  the  Alliant,  a 
factor  of  9  improvement.  We  feel  this  illustrates  the  poten¬ 
tial  of  the  biquadratic  element,  even  for  flows  with  shocks. 

7  Conclusions 

We  feel  that  both  the  Galerkin  and  cell-vertex  scheme# 
are  good,  robust  methods  for  solving  the  Euler  equations. 
For  moat  practical  problems  there  is  very  little  difference 
between  them.  We  do  not  recommend  the  central  difference 
scheme,  for  stability,  efficiency  and  dispersive  reasons.  For 
three-dimensional  problems,  th*  cell-vertex  method  has  a 
much  lower  operation  count. 

The  techniques  and  algorithms  used  here  should  carry 
over  directly  to  three  dimensions.  Work  is  in  progress  on  a 
3-D,  hexahedral,  adaptive  code.  A  major  problem  in  3-D  is 
displaying  the  result*. 

The  directional  adaptation  ha*  proven  to  be  less  effective 
for  inviscid  shocked  flows  than  for  viscous  flows  (14).  One 
possible  improvement  might  be  to  combine  grid  motion  with 
directional  adaptation.  This  would  reduce  a  common  prob¬ 
lem  with  gTid  moving  schemes;  if  certain  areas  of  th*  grid 
became  too  sparse,  the  embedding  could  fill  them  in. 


It  is  possible  to  mix  a  mesh  of  bilinear  and  biquadratic 
elements  in  a  single  problem.  This  would  have  the  advan¬ 
tage  of  allowing  biquadratic  elements  to  be  put  in  smoother 
portions  of  the  flow,  and  bilinear  elements  near  shocks,  get¬ 
ting  the  best  of  both  worlds.  The  smoothing  on  biquadratic 
elements  is  also  under  investigation.  We  are  working  on  an 
implementation  in  which  the  biquadratic  element  is  divided 
into  four  sub-elements,  each  of  which  is  smoothed  by  th* 
high  accuracy  method  mentioned  above. 
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Figure  6:  Surface  Mach  Number  Comparisons  for  the  Three 
Methods,  Transonic  Case 
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Figure  7:  Coarse  Grid  Surface  Mach  Number,  Galerkin, 
Supersonic  Case 


10 


1300  Element*,  M  m  1.40 


1300  Element*,  M  at  1.40 


Figure  8:  Coarse  Grid  Surface  Mach  Number,  Cell- Vertex, 
Supersonic  Case 


1300  Element*,  M  at  1.40 


Figure  10:  Coarse  Grid,  Mach  Number  at  Mid-channel,  AU 
Three  Methods,  Supersonic  Case 
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Figure  9:  Coarse  Grid  Surface  Mach  Number,  Central  Dif¬ 
ference,  Supersonic  Case 


Figure  11:  Surface  Mach  Number  Comparisons,  Fine  Grid, 
All  Three  Methods,  Supersonic  Case 
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Figure  13:  Grid,  4%  Circular  Arc  Bump,  Biquadratic  Ele- 
menta-Superaonic  Caae 


Figure  15:  Preaaure  Contoura,  10%  Circular  Arc  Bump, 
Biquadratic  Elementa-Tranaonic  Caae 
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Figure  13:  Pressure  Contour*,  4%  Circular  Arc  Bump,  Bi- 
quadratic  Elementa-Superaonic  Caae 
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Figure  16:  Surface  Pressures,  10%  Circular  Arc  Bump,  Bi 
quadratic  Elementa-Tranaonic  Caae 
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„  Figure  17:  Initial  Grid,  10%  Circular  Arc  Bump- Transonic 

Figure  14:  Surface  Preeeuree,  4%  Circular  Are  Bump,  Bi- 

quadratic  Element#-  Supersonic  Case 


Figure  21:  Final  Grid,  1095  Circular  Arc  Bump,  Directional 
Figure  18:  Final  Grid,  1095  Circular  Arc  Bump-Traneonic  Adaptation-Transonic  Caw 

Case 


Figure  19:  Presaure  Contour*,  10%  Circular  Arc  Bump, 
Distorted  Grid-Tranaonic  Caae 


Figure  22:  Preaaure  Contour*,  10%  Circular  Arc  Bump, 
Directional  Adaptation- Tranaonic  Caae 
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Figure  20:  Surface  Preaaure,  10%  Circular  Arc  Bump,  Die-  Figure  23:  Scramjet  Inlet  Initial  Grid 

torted  Grid-Tranaonic  Caae 
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Figure  24:  Scramjet  Final  Grid,  M=5.0 
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Figure  25:  Deneity  Contour*,  M=5.0 
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Figure  27:  Mach  Number  Contours,  M=2.85 


15 


0.0  0.4  1.0  1.5  SO  a. 5  3.0 

X 


Figure  29:  Density  Contours,  M=5.0,  Biquadratic  Elements 
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ABSTRACT 

An  implicit  approximate  factorization  (AF)  algorithm  is  constructed  which  has  the  following 
characteristics. 

•  In  2-D:  The  scheme  is  unconditionally  stable,  has  a  3  x  3  stencil  and  at  steady  state  has  a 
fourth  order  spatial  accuracy.  The  temporal  evolution  is  time  accurate  either  to  1st  or  2nd 
order  through  choice  of  parameter. 

•  In  3-D:  The  scheme  has  almost  the  same  properties  as  in  2-D  except  that  it  is  now  only 
conditionally  stable,  with  the  stability  condition  (the  CFL  number)  being  dependent  on  the 
“cell  aspect  ratios,”  Ay /Ax  and  Az/Ax.  The  stencil  is  still  compact  and  fourth  order  accuracy 
at  steady  state  is  maintained. 

Numerical  experiments  on  a  2-D  shock-reflection  problem  show  the  expected  improvement  over 
lower  order  schemes,  not  only  in  accuracy  (measured  by  the  Lj  error)  but  also  in  the  dispersion. 

It  is  also  shown  how  the  same  technique  is  immediately  extendable  to  Runge-Kutta  type  schemes 
resulting  in  improved  stability  in  addition  to  the  enhanced  accuracy.  1 
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1  INTRODUCTION 

It  can  be  shown  [l]  that  numerical  approximations  to  the  linearized  Euler  equations  of  gas  dynamics 
give  rise  to  dispersive  errors  which  in  the  2-D  supersonic  case  depend  on  a  similarity  parameter 
k  =  (Ay/AxJv^Af,^  “  1  (under  the  assumption  t;  <SC  u  everywhere,  where  u  and  v  are  the  x 
and  y  components  of  the  velocity  vector).  The  difference  between  the  dispersion  relations  of  any 
numerical  algorithm  and  that  of  the  original  partial  differential  equations  can  be  plotted  as  curves 
in  the  Fourier  plane  with  k  as  a  parameter. 

In  particular  the  results  in  [l]  indicate  that  for  central-difference  schemes,  the  dispersive  error 
are  contributed  mostly  by  the  third  power  of  the  errors  in  the  Fourier  variables  9  and  <f>.  It  is, 
therefore,  natural  to  think  of  fourth  order  spatially  accurate  algorithms  as  having  better  dispersive 
properties.  By  utilizing  the  structure  of  the  Euler  equations  one  can  obtain,  on  a  cartesian  grid,  a 
fourth  order  approximation  which  instead  of  using  a  5  x  5  stencil  (and  5  x  5  x  5  in  3-D)  relies  on 
a  compact  support  of  3  x  3  (and  3  x  3  x  3  in  3-D).  The  advantages  of  the  combination  of  fourth 
order  accuracy  together  with  compact  support  are  quite  obvious  in  terms  of  total  computer  work 
and  memory. 

In  Section  2,  we  describe  the  construction  of  an  approximate  factorization  (AF)  central  differ¬ 
ence  scheme  and  examine  its  theoretical  (linear)  stability  properties.  In  Section  3,  we  derive  the 
corresponding  4-step  Runge-Kutta  scheme  and  show  how  the  Jameson-Schmidt-Turkel  algorithms 
[2]  may  be  easily  modified  to  that  form  which  has,  in  addition  to  the  higher  accuracy,  markedly  en¬ 
hanced  stability  limits.  In  Section  4,  we  describe  some  numerical  experiments  using  the  AF-version. 
Section  5  summarizes  our  findings. 


2  DERIVATION  OF  THE  APPROXIMATE-FACTORIZATION 
SCHEME 


2.1  The  Two-Dimensional  Case 


Consider  a  general  hyperbolic  conservation  law  in  2-D: 


u»  +  /*  +  gv=  0. 

In  the  case  of  Euler  equations,  for  example,  the  vectors  u ,  /  ( u ) ,  g  (u)  are  given  by 


p  y 

f  pvi  ^ 

l 

'  pv  y 

pu 

7= 

pu2  +p 

0  = 

pvu 

pv 

puv 

pv  +P 

E  ) 

i  «(£  +  p )  J 

\  v(E  +  p)  J 

(1) 


(2) 


where  p,  u,  v,  E,  and  p  are  respectively  the  density,  velocity  components  in  the  x  and  y  directions, 
the  total  energy  per  unit  volume  and  the  pressure.  In  addition  there  is  the  equation  of  state  relating 
algebraically  (in  the  case  of  gas-dynamics)  the  internal  energy  to  the  pressure  and  density.  One 
may  also  write  (1)  in  non-conservation  form  as 


ti<  +A  u,  +£  uv=  0 


(3) 


where  A  and  B  are  the  Jacobians  of  /  and  g  with  respect  to  u  . 
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Consider  first  a  forward-time  second  order  central  finite  difference  approximation  to  (l) 

«n+1  —  tin  fn  —  fn  nn  —  nn 

^  +  !i"\  /)-'>  +  =  0  (4) 

At  Ax  Ay  w 

where  we  have  dropped  the  sup-arrow  indicating  a  vector  and  used  the  conventional  differencing 
notation 

“j,*=  uy,*  =*  u(jAx,kAy,nAt).  (5) 

We  have  a  cartesian  grid  with  nodes  at  x  =  j Ax,  y  =  kAy  and  t  =  nAt.  We  shall  now  introduce 
the  usual  shift  operations  notation: 


Dtu  =  u2tl  ~  «£* 

(6) 

Szu  =  k  -  U;_i  k, 

8V  = 

Eq.  (4)  then  may  be  written  as: 


£){U  —  —  — /iy  8yg  (7) 

where  A  =  At/ Ax  and  IR  —  Ay/ Ax.  If  we  take  a  Taylor  series  expansion  about  the  grid  point 
(x,y,t)  =  (j Ax,  kAy,  nAt),  we  can  construct  the  modified  equation  corresponding  to  (7): 


At 

+  -j *««  + 


.  Ax 2  , 
=  -  /*  +  —/= 


Ay2 

***  H -  “  gv  +  “jj-Pyyy  +  '  * 


At  Ax2  Ay2 

“t  +  /*  +  9V=  ~  "u«  +  -fi-fxzz  +  -gj -9vvv  •  (8) 

Thus  if  we  want  to  approximate  (1)  to  a  higher  than  second  order  (and  in  particular  to  fourth 
order  in  sp  .ce-see  comment  in  Introduction  concerning  dispersive  errors)  we  must  modify  (7)  by 
subtracting  out  the  terms  on  the  right  hand  side  of  (8).  At  this  point  we  realize  that  using  a 
straightforward  approach  to  approximating  fsxx  and  gyyy  will  lead  to  bigger  stencils.  However, 
using  the  original  partial  differential  equation,  (1),  we  have 

fzzz  =  —  utx*  —  9yzz  (9) 

and  similarly 

9yyy  =  ~  utyy  ~  fzyy  (10) 

Since  fxxx  and  gyyy  need  be  approximated  only  to  second  order  (because  of  their  coefficients,  Ax2 /3 
and  Ay2/3)  the  required  stencil  for  the  terms  on  the  right  hand  side  is  only  3x3.  (This  observation 
was  previously  made,  in  another  context  by  Jones,  et  al.  [3].)  So,  replacing  fxxx  and  gvvv  in  (8) 
by  - 62Dtu/Ax2At  -  82ny8yg/  Ax2  Ay  and  -82Dtu/Ay2At  -  Sy2nx6xf /Ay2 Ax,  respectively;  and 
also  replacing  u«  by  Dtut/At  — »  Dt(-fx  -  gv)/At  — ►  -Dt  {{px6xf  /  Ax)  +  (ny6yg/A y)]  we  obtain 
the  following  approximation  to  (1)  which  is  spatially  fourth  order  accurate  and  temporally  second 
order  accurate: 

At  / (ix6x  nySy  \  Ax 2  8x 2  Ay2  8y2 

Dt[u+T  V  aT7  +  ~a79)  +  +  IT  A?9 
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=  -A nz6xf  -  -Az3-^ '^g  -  —fiySyg  -  —Ay3—**  /. 

*'  3!  AyAz2’  2R^V  v*  3!2R  y  AzAy2' 

Using  (3)  and  the  definition  of  D<  we  rearrange  the  above  into  the  delta  form 

'+  |  ('W,  +  §/-„*„)  +  J[  (as,1  +  B«*)]  («£*  -  <») 

= -v,«,  (/ + yi)  /“,  -  ^  (/ + s*j)  *  (id 

where  I  is  the  identity  matrix.  This  is  an  implicit  unfactored  scheme.  Since  we  are  interested  in 
marching  to  steady  state  we  approximate-factor  the  left  hand  side  to  obtain: 

(I+  kAS- +  (/+  j;38.1  +  iA|',"8s)  -  «**) 

=  -A  {fc«.  (/  +  i«;)  l?*  +  (/  +  is,1)  »?,}  .  (12) 

We  introduced  into  (12)  the  parameter  a.  If  a  =  1  then  we  retain  the  temporal  second  order 
accuracy  while  a  =  2  gives  first  order  in  time.  Note  that  (12)  involves  the  inversion  of  block- 
tridiagonal  matrices.  The  right  hand  side  represents  the  steady  state  operator  to  fourth-order.  The 
whole  scheme  involves  only  a  3  stencil. 


2.2  The  3-D  Case 

The  starting  point,  corresponding  to  (1)  is 

+  /*  +  9y  +  ^x—  0.  (13) 

Following  the  same  steps  as  in  the  2-D  case,  we  get  the  modified  equation 

[At  Az2  Ay2  Az2 

t»t  +  /*  +  fly  -+■  hB  =  -  —  +  -gj-/xz*  +  -jjj-flyyy  +  -gj-^***  (14) 

where  using  (13),  we  have 

fzxx  ~  ~utzz  ~  flyxx  —  htXz  (15) 

flyyy  —  —  u»yy  —  ^*VV  ~  f*VV  (l®) 

A*jr x  =  ~utxx  ~  fzxx  ~  9yxx •  (17) 

Repeating  all  the  steps  leading  to  (12)  we  obtain  its  three  dimensional  analog: 

(/  +  $ASl  +  |A A*a)  (/  +  ±B8}  +  §A§Mw)  (/  +  ^  +  f  A |Ma)  («S1  “  u"m)  = 

=  -  A  flk  l  +  Hy6y  +  3|5J  +  9j,k,l  +  f*x8x  +  jj**  +  gi^y) 

(18) 

In  (18)  the  matrix  C  is  the  Jacobian  d  h  /d  u  and  the  shift  operators  are  defined  in  a 

manner  analogous  to  (6).  Q  is  the  cell  aspect  ratio  in  the  z  direction,  Q  =  Az/Az. 
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2.3  Stability  Analysis  for  the  2-D  Case 

We  consider  the  linearized  (i.e.,  frozen  coefficients)  version  of  (12).  We  carry  out  a  Von-Neumann 
analysis  in  the  usual  manner  by  casting  (12)  in  Fourier  space.  A  typical  Fourier  component  of  u"k 
is  given  by 

u]k  -  une*,*e'M  (19) 

where 


4>  =  lAx,  0  =  mAy 

(20) 

with 

-x  <  <j>,0  <  x  and  -oo  <  t,m  <  oo. 

The  mapping  of  the  various  shift  operations  is  as  follows: 

(21) 

>  2»sin^-  =  2t£,  6vu"k  2ts«nf  =  2tr) 

it 

(22) 

At,u"k  -»  COS  J  =  \/l  -  $l,  My«”jfc  -+  COSj  =  VI  -  »)J 

(23) 

u£k  -♦  - 4s«'n2  ^  =  -  4£l,  -►  -  4stn2 |  =  -  4»72 

(24) 

and  so, 

M*£*u”,*  -»  2:‘r Vl  “  €2,  My$y «£*  -»  2i'tjV1  -  »72- 

We  also  define  the  amplification  matrix  G  by  un+1  =  G un.  With  these 
coefficient  version  of  (12)  is  mapped  into  the  Fourier  space  as  follows: 

(25) 

notations,  the  frozen 

(j  ~  +  i*lA£y/ 1  -  $2)  (l  -  ^ Br\ 2  +  ia^rfyjl  ~  »72) 

{o-n 

=  ~2«'A  1  -  £2  (^/  -  +  jgiyJ  1  -  V2 

)]•  (26) 

In  the  general  case  the  matrices  A  and  B  do  not  commute,  thus  rendering  the  analysis  of  (26) 
almost  intractable.  It  is  instructive,  however,  to  consider  the  scalar  case.  Since  the  aspect  ratio 
]R  —  Ay/ Ax  is  arbitrary,  and  since  the  original  partial  differential  equation  (1),  in  the  scalar  linear 
case,  could  be  transformed  to  the  wave  equation 


uj  -h  tij  -f-  tip  =  0 


with  t  =  At,  x  =  x  and  v  =  Ay/ B,  we  can  without  loss  of  generality  (in  this  special  linear  scalar 
case)  rewrite  (26)  as: 


(.  -  + *wr?)  (i  -  f (O  -  ■) 

=  (» -  s"’)  +  (>  -  f «’)  ]  • 

Equation  (27)  can  be  rearranged  to  solve  for  the  amplification  factor  G  : 

r  q  +  »(f-i)fc 

a  +  if  6 


(27) 


(28) 
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where 


- = 1  -  !<’  -  b‘ + -  i<rjA'£,v/rr?v/ri7 

and  b  ia  the  Fourier  map  of  the  steady-state  operator,  i.e.: 

b = 2  [A*>/ni*(i  -  ^2) + ^v/iT^(i  -  ^2)  • 

We  see  immediately  from  (28)  that  for  a  =  1  (i.e.,  the  case  of  second  order  temporal  accuracy)  we 
have  a  Crank-Nicholson  type  scheme  with  ||G|j  =  1.  For  a  >  1  (i.e.,  first  order  accuracy  in  time), 
we  have  ||G||  <  1.  We  have  thus  demonstrated  the  unconditional  stability  of  (27)  for  all  values  of 
the  cell  aspect  ratio. 

2.4  Stability  for  the  3-D  Case 
The  three  dimensional  analog  to  (27)  is 

(1  “  +  iaX^yJl  -  £2)(1  -  |r?2  +  ia^ny/l  -  ri*)[l  -  |f2  +  -  ?2)(G  -  1)  = 

=  —2i\  - §,■  -  2-S)  +  -  h'  ~  f*2)  +  - 1?  - 1*,') 

(31) 

where  f  is  the  Fourier  dual  variable  defined  analogously  to  (  and  tj. 

The  stability  of  three  dimensional  amplification  factor  G,  as  defined  by  (31),  is  difficult  to 
analyze  and  we  resorted  to  numerical  evaluations  of  |G|2,  using  up  to  8  X  10®  Fourier  modes.  The 
numerical  study  of  (31)  was  carried  out  on  the  Cray  2.  We  found  that  as  in  the  case  of  forward 
Euler  approximate  factorization  second  order  scheme  [4],  the  amplification  factor  was  conditionally 
stable.  For  example  for  2R  =  Q  =  1,  the  stability  limit  is  A  <  .43.  These  stability  properties  are 
obtained  with  the  aid  of  artificial  viscosity  (AV)  term  which  is  of  sixth  order  but  still  resides  on 
the  compact  stencil.  The  AV  term  is  added  to  the  right  hand  side  (explicit  term)  of  (18)  and  is  of 
the  form 

(...Ais’Sj*,’  (32) 

where  jzau  is  of  order  unity.  We  found  1.5  <  pav  <  2  to  be  most  efficacious.  Without  the  artificial 
viscosity  term,  the  allowed  value  of  A  is  about  one  order  of  magnitude  less  (e.g.,  for  R  =  Q  =  1,  A 
without  using  AV  is  about  .035). 


(29) 

(30) 
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3  COMPACT  FOUR  STEP  FOURTH  ORDER  RUNGE-KUTTA 
ALGORITHM 

The  basic  four  step  explicit  Runge-Kutta  scheme,  as  proposed  in  [2],  takes  the  form: 

u°  =  u" 

u(i)  =  u(")  _  ^Af/2(u(°)) 

-  jAt.R(uW) 

(33) 

u(3)  =  u(n)  -  |Atii(u^)) 
u(<)  =  u(")  _  A tR(u,W) 

u(n+l)  _  „(<) 


where  R  is  the  finite  difference  (or  finite  volume)  representation  of  the  steady  operator.  For  example, 
in  the  2-D  second  order  spatial  accuracy  case 


*(«)  = 


Mx«S« 

Ax 


■/+*$» 


(34) 


where  /  and  g  are  the  flux  vectors  encountered  in  section  2.  If  we  want  fourth  order  spatial  accuracy, 
then  it  follows  directly  that  the  residual  R  is  modified  to  read 


(35) 


and  we  still  retain  the  compact  support. 

Similarly,  in  the  three  dimensional  case  we  have 


*-&■*+ k + k» + £<* + k + + k + k» 


Ay 


6 


A  z 


6 


(36) 


Thus  (33),  with  R  given  either  by  (35)  for  the  2-D  case  or  by  (36)  for  the  3-D  case,  retains  all  the 
features  of  the  second  order  scheme  but  gains  us  the  fourth  order  accuracy.  In  addition  one  can 
easily  verify  by  simple  analysis  that  for  a  given  cartesian  grid  and  flow  conditions  the  new  fourth 
order  formulation  enhances  the  stability  condition.  In  the  2-D  case  we  have,  using  (35)  rather  than 
(34) 

(At)(4)  £  (At)  or  (Ur  _  j  jg 

(At)(j)  (AtJj^or.fcr 

In  the  3-D  case  the  gain  is  even  more  favorable, 


(37) 


=  1.66.  (38) 

Thus  the  algorithm  efficiency  gains  are  two  fold.  First,  for  a  given  acceptable  error  level  the  fourth 
order  accuracy  allows  a  coarser  grid,  i.e.,  fewer  node  points.  Second,  not  only  At  is  increased  due 
to  the  larger  cell  size  but  in  addition  it  gains  due  to  (37)  (or  (38)  in  the  3-D  case). 


(*t)4 

(At), 
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4  NUMERICAL  RESULTS  FOR  2-D  CASE 

The  two-dimensional  fourth  order  compact  scheme  is  applied  to  a  shock  reflection  problem  sketched 
in  Figure  1.  It  shows  a  5°  shock  at  Mach  1.95  reflecting  from  a  flat  plate.  Results  are  presented 
both  without  any  explicit  artificial  viscosity  (AV)  and  with  a  fourth  order  AV  term  of  the  type 
-  )  added  to  the  right  hand  side.  Addition  of  this  AV  term  reduced  the  accuracy  of  the 

compact  scheme  to  third  order  (note  that  in  3-D  the  sixth  order  AV  terms  precludes  this  reduction 
in  accuracy) .  Calculations  are  also  made  with  a  second  order  implicit  Euler  AF  scheme  without  and 
with  the  preceding  AV  term.  Figures  2a  and  2b  show  the  results  of  the  compact  scheme  whereas 
Figures  3a  and  3b  show  the  corresponding  results  from  the  second  order  scheme.  It  is  seen  that  for 
s  =  0,  both  fourth  and  second  order  schemes  produce  very  oscillatory  results  but  with  t  =  0.36,  the 
results  from  the  compact  scheme  (Figure  2b)  improve  dramatically  and,  in  fact,  are  much  better 
than  the  corresponding  results  obtained  from  the  second  order  scheme  (Figure  3b).  The  shocks 
captured  by  the  compact  scheme  are  sharper  and  the  convergence  is  also  seen  to  improve. 

Results  from  a  study  of  grid  aspect  ratio  effect  on  the  compact  scheme  are  also  presented  here 
in  terms  of  the  similarity  parameter  k  of  reference  1.  Three  values  of  k  are  considered,  namely  1.67, 
1.01,  and  0.42.  Figure  2b  corresponds  to  it  =  1.67  and  figures  4  and  5  correspond  to  k  =  1.01  and 
0.42,  respectively.  In  all  these  cases,  e  is  set  equal  to  0.36.  It  is  clear  from  the  figures  displaying 
the  effect  of  k  that  the  best  results  are  obtained  for  k  near  unity.  In  reference  1,  a  linear  theory 
(e.g.,  for  weak  shock)  predicts  the  same  results.  It  is  interesting  that  we  find  numerically  that  this 
is  also  the  case  in  the  present  nonlinear  problem. 

SUMMARY 

1.  The  steady  state  solution  of  the  Euler  equations  of  gas  dynamics  may  be  achieved  to  fourth 
order  accuracy  using  a  compact  grid  stencil  of  3  x  3  and  3  x  3  x  3  in  the  2-D  and  3-D  cases 
respectively.  We  presented  two  examples  of  such  algorithms:  one  implicit  (Euler  approximate 
factorizations  scheme)  and  one  explicit  (Four-stage  Runge-Kutta). 

2.  Numerical  experiments  were  carried  out  for  the  2-D  shock  reflection  problem,  using  the  im¬ 
plicit  algorithm.  Comparisons  are  made  with  a  corresponding  second  order  scheme.  The 
results  show  that  the  compact  higher  order  scheme  offers  marked  improvement  in  both  accu¬ 
racy  and  convergence  rate. 

3.  In  connection  with  this  work  we  would  like  to  make  the  following  remarks.  It  is  known  that 
the  finite  difference  scheme  cannot  obtain  high  order  accuracy  for  conservation-form  equations 
computed  on  a  non-uniform  grid.  This  observation  coupled  with  the  ease  of  obtaining  fourth 
order  compact  schemes  even  in  3-D  for  the  Euler  equations  revives  an  old  debate:  Can  one  use 
uniform  grid  and  apply  conveniently  boundary  conditions  to  arbitrary  shapes.  The  potential 
gain  in  reduced  number  of  computational  nodes  and  enhanced  convergence  rate  appears  large 
enough  to  study  this  question  again. 
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Figure  1:  Shock  reflection  problem 
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Figure  2a:  Pressure  distribution  and  residual  plot  for  compact  scheme  (e  =  0,  k  =  1.67). 
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1  Abstract 


Dispersive  errors  in  the  discretisation  of  the  steady  Euler  equations  describing  the  flow  of  a  compressible, 
inviscid,  ideal  gas  can  produce  low  wave  number  oscillations  near  regions  of  high  gradient.  A  linearised  analysis  is 
presented  which  allows  one  to  predict  the  location  and  frequency  of  these  oscillations.  This  analysis  is  applied  to 
three  numerical  schemes:  Galerkin  finite  element,  cell-vertex  finite  element,  and  central  difference  finite  element. 
Numerical  experiments  are  presented  verifying  the  analysis.  An  example  showing  the  applicability  of  the  analysis 
to  a  problem  with  significant  nonlinearity  is  shown. 


2  Introduction 


Numerical  solution  of  the  Euler  equations  describing  the  dynamics  of  an  inviscid,  compressible,  ideal  gas  are 
becoming  an  important  tool  for  the  practicing  aerodynamicist  [1].  In  many  solutions,  low  wave  number  oscillations 
have  been  observed  in  the  vicinity  of  rapid  flow  variations  [2).  These  oscillations  cannot  be  explained  by  problems 
in  artificial  viscosity,  as  their  frequency  is  very  low,  and  the  amplitude  is  relatively  independent  of  the  amount  of 
artificial  dissipation  used. 

In  this  paper,  we  examine  the  dispersive  properties  of  three  particular  algorithms.  Each  of  these  algorithms 
is  derived  from  a  finite  element  formulation,  discussed  in  detail  in  [3]  and  briefly  discussed  in  Section  3.1.  These 
three  algorithms  are  the  Galerkin  finite  element  method  [4,5,6],  the  cell-vertex  finite  element  method,  and  the 
central  difference  finite  element  method  [7j.  The  cell-vertex  algorithm  is  identical  to  the  node-based  finite  volume 
method  [8,9]  or  the  first-order  step  in  Ni’s  method  [10].  On  grids  with  parallelogram  elements,  the  central 
difference  method  is  equivalent  to  Jameson’s  cell-centered  finite  volume  method  [11]. 

The  approach  taken  is  to  analyse  the  dispersive  properties  of  the  linearised,  steady  F-  ‘  “quations  on  a 
regular  mesh,  using  the  spatial  derivative  operator  for  each  of  three  methods  discus*'  '  >  (Galerkin,  cell- 

vertex,  central  difference).  This  analysis  is  applied  to  a  model  problem,  and  the  prediction  or  the  frequency  and 
location  of  the  dispersive  oscillations  is  demonstrated.  Finally,  the  analytic  theory  is  validated  by  comparison 
with  numerical  experiments. 

3  Solution  Algorithm 

In  this  study,  the  two-dimensional  Euler  equations  describing  the  flow  of  an  inviscid,  compressible  fluid  are 
considered.  To  allow  the  capture  of  shocks  and  other  discontinuous  phenomena  (such  as  slip  lines),  the  Euler 
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equations  are  written  in  conservative  vector  form  as 


P 

d  pu 
dt  pv 


P«3  +  P 


3  put; 
dy  pv 3  +  p 

pvh 


dO  dP  3G 

dt  +  dx  +  dy  °  ^ 

where  e  is  total  energy,  p  is  pressure,  p  is  density,  u  and  v  are  the  x  and  y  flow  velocities,  0  is  a  vector  of 
state  variables,  P  and  G  are  flux  vectors  in  the  x  and  y  directions,  and  h  is  the  total  enthalpy,  given  by  the 
thermodynamic  relation 

L _ i  P  ll\ 


h=  e  + 

P 


In  addition,  one  requires  the  equation  of  state 


P  L  2 


where  the  specific  heat  ratio  7  is  taken  as  a  constant  (1.4)  for  all  calculations  reported. 


3.1  Spatial  discretisation 


The  finite  element  approach  to  discretizing  these  equations  divides  the  domain  into  elements  determined  by 
some  number  of  nodes  (in  the  current  implementation,  4  nodes  make  up  a  bilinear  element). 

Within  each  element  the  state  vector  and  flux  vectors  and  G^  are  written 

tf<«)  =  J2Nie)u<e) 

jri«)  =  J2Nie)Fie)  (6) 

<?(*>  =  ^2^e)Gje)  (7) 

where  Uje  ,  and  G^  are  the  nodal  values  of  the  state  vector  in  element  e  and  the  are  a  set  of  bilinear 
interpolation  functions  on  that  element.  These  interpolation  functions  are  expressed  in  terms  of  local  coordinates 
(£,*?),  which  are  related  to  ( x,y )  by  an  isoparametric  transformation.  Thus,  inherent  in  the  formulation  that 
follows  are  some  transformational  metrics,  which  are  not  shown  for  clarity. 

These  expressions  can  be  differentiated  to  obtain  an  expression  for  the  derivative  in  each  element  in  terms  of 
the  nodal  values  (shown  here  for  the  state  vectors) 


_  y'  ™}']rAe) 
dx}-  ^  dXj  ‘  ’ 


The  flux  vector  derivatives  are  calculated  the  same  way. 

The  expression  for  the  derivatives  is  substituted  into  equation  (2)  and  summed  over  all  elements  to  obtain 

where  is  now  a  global  vector  of  interpolation  functions,  determined  by  summing  the  interpolation  functions 
for  each  element. 


2 


1 


1 


0 


Galerkin 


Cell- Vertex 


Central  Difference 


Figure  1:  Difference  Stencils  for  x  Derivative,  Three  Methods 

The  next  step  can  be  thought  of  as  a  projection  onto  the  space  spanned  by  some  other  functions  N',  called 
test  functions,  such  that  the  error  in  the  discretisation  is  orthogonal  to  the  space  spanned  by  the  test  functions 
(for  more  detail  on  the  mathematics  involved  see  [12]).  To  do  this,  multiply  Eq.  (9)  by  and  integrate  over  the 
entire  domain.  This  results  in  the  semi-discrete  equation 


-II 


M ^  =  -Rji  -  RyGi  (12) 

where  M  is  the  consistent  mass  matrix  and  Rx  and  Ry  are  what  we  call  residual  matrices.  The  mass  matrix 
M  is  sparse,  symmetric,  and  positive  definite,  but  not  structured,  so  it  is  replaced  by  a  lumped  (diagonal)  mass 
matrix  Ml  in  which  each  diagonal  entry  is  the  sum  of  all  the  elements  in  the  corresponding  row  of  M.  This 
allows  Eq.  12  to  be  solved  explicitly.  The  lumping  does  not  change  the  steady-state  solution,  but  does  modify 
the  time  behavior  of  the  algorithm.  Finally,  this  set  of  ODE’s  is  integrated  in  time  to  obtain  a  steady  solution. 
The  details  of  the  time  integration,  artificial  viscosity  formulation,  and  boundary  conditions  are  not  critical  to 
the  understanding  of  dispersion,  and  can  be  found  in  [3]. 

3.2  Choice  of  Test  Functions 

Various  choices  for  N1  are  possible,  each  giving  rise  to  a  particular  discretisation.  If  one  choses  each 
to  be  the  corresponding  N-e\  one  obtains  the  Galerkin  finite  element  approximation.  If  one  chooses  each 
to  be  a  constant,  the  "cell-vertex"  approximation  [8,9]  results.  This  approximation  is  identical  to  a  node-based 
finite  volume  method.  Finally,  if  the  are  chosen  as 


N*-[ 


one  obtains  the  central  difference  or  collocation  approximation  [7].  On  a  mesh  of  parallelograms,  this  is  identical 
to  a  cell-based  finite  volume  method.  If  the  mesh  is  anything  else,  this  equivalence  does  not  hold.  Figure  1  shows 
the  difference  stencils  for  an  x  derivative  on  a  mesh  with  Ax  =  1.  Some  properties  of  these  stencils  are  discussed 
in  the  following  sections. 


3 


x  derivative 


y  derivative 


-1 


1 


1 


1 


-1 


1 


-1 


Figure  2:  Difference  Stencil  For  a  Single  Cell- Vertex  Element 


S.S  Some  Properties  of  the  Galerkin  Scheme 

This  approximation  has  several  interesting  features.  First,  it  gives  the  minimum  steady-state  error,  since 
there  is  no  component  of  that  error  in  the  space  of  the  interpolation  functions.  Second,  for  the  steady-state  Euler 
equations  on  a  uniform  mesh,  it  is  a  fourth-order  accurate  approximation.  This  interesting  feature  was  pointed 
out  by  Abarbanel  [13],  and  can  be  seen  by  examining  the  truncation  error  for  the  x  and  y  derivatives.  The 
truncation  errors  for  the  derivatives  of  F  and  G  with  respect  to  x  and  y  can  be  written  (with  P  representing  the 
derivative  operator)  as 

P,F  =  Fa  +  2FxxxAxa  +  2FxyyAyi  +  H.O.T.  (14) 

PyG  =  Gy  +  2GyyyAy2  +  2GJJyAx3  +  H.O.T.  (15) 

so  the  discrete,  steady  residual  PXF  +  PVG  can  be  written 

PXF  +  PyG  =  Fx  +  Gy  +  2Axa(F«x  +  Gxxv)  +  2A  tf[Fx„  +  GyVy)  +  H.O.T.  (16) 

But  in  Eq.  (16)  note  that  the  coefficients  of  the  Ax3  and  Ay3  terms  are  the  derivatives  of  the  quantity  Fx  +  Gy 
twice  with  respect  to  x  and  y.  For  the  steady  Euler  equations,  the  quantity  Fx  +  Gv  =  0  so  these  terms  will  be 
higher-order.  Thus,  on  a  regular  mesh  of  parallelograms,  the  Galerkin  method  should  be  fourth  order  accurate. 
Note  that  this  cancellation  of  truncation  error  fails  if  the  equation  is  inhomogeneous  (such  as  for  the  Navier-Stokes 
or  Conical  Euler  equations),  or  if  the  mesh  is  not  parallelograms  (as  in  most  practical  problems). 


3.4  Some  Properties  of  the  Cell-Vertex  Scheme 

The  most  interesting  property  of  the  cell-vertex  scheme  is  that  it  is  second-order  accurate  on  any  mesh  of 
parallelograms,  independent  of  mesh  stretching.  To  see  this,  consider  the  contribution  of  a  single  element  to  the 
residual  at  a  node.  Figure  2  shows  the  difference  stencils  for  the  x  and  y  derivatives  at  the  node  denoted  by  the 
open  circle. 

Following  the  approach  of  the  previous  section,  the  truncation  errors  for  the  derivatives  of  F  and  G  with 
respect  to  x  and  y  can  be  written 

PXF  =  Fx  +  Fxx^- +  FXy^- +  H.O.T.  (17) 

PyG  =  Gy  +  Gyy^-  +  Gxy^-  +  K.O.T.  (18) 

so  the  discrete,  steady  residual  PXF  +  Pt,G  can  be  written 

PXF  +  PyG  =  Fx  +  Gy  +  — —  [Fxx  +  GXy)  +  —^  ( E*  y  +  G  yy)  +  H.O.T.  (l9) 

it  it 
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But  again  note  that  in  Eq.  (19)  the  coefficients  of  the  Ax  and  Ay  terms  are  the  derivatives  of  the  quantity 
Fs  +  Gv  with  respect  to  z  and  y.  For  the  steady  Euler  equations,  the  quantity  Fx  +  Gv  =  0  so  these  terms  will 
be  higher-order.  Thus,  on  any  mesh  of  parallelograms,  the  cell-vertex  method  should  be  second-order  accurate. 
Note  that  this  cancellation  of  truncation  error  again  fails  if  the  equation  is  inhomogeneous.  This  seems  to  indicate 
that  the  cell-vertex  method  may  be  superior  on  highly  stretched  grids,  or  on  grids  with  embedded  regions.  For 
most  practical  problems,  little  difference  is  observed  between  the  cell- vertex  and  Galerkin  methods. 


4  Linearization  of  the  Equations 


This  section  describes  the  linearisations  of  the  Euler  equations,  using  a  method  suggested  by  Giles  [14).  The 
2-D  Euler  equations  (Eq.  1)  can  be  rewritten 


where 


au_  dAU_  dBU 
dt  +  dx  +  dy 


p 

u  p  0  0 

v  0  p  0 

u 

0  u  0  ± 

0  v  0  0 

A  = 

p 

B  = 

V 

0  0  u  0 

0  0  u  i 

9 

.  p  , 

0  'ip  0  u 

0  0  7p  v 

(20) 


(21) 


The  equations  are  linearized  by  “freezing”  the  A  and  B  matrices,  so  they  can  be  removed  from  inside  the  derivative 
operator.  In  the  steady  state,  the  time  derivative  vanishes,  so  we  can  write  the  linearized  Euler  equations  in 
operator  form  as 

(Aax  +  Bsy)U  =  0  (22) 

where  s*  and  av  are  the  z  and  y  derivative  operators.  If  we  desire  non-trivial  solutions  to  this  equation,  the 
operator  matrix  (As,  +  Bay)  must  have  zero  determinant.  This  is  the  statement  that 


US*  +  V3y 
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0 
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0 
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%/a*3  +  *y2 

3V 

\/s* 3  +  Sy  3 


and  Eq.  (23)  can  be  expanded  to 


(ru  -I-  so)3  [a3(r3  +  a2)  -  (ru  +  sv)3]  =  0 
where  a  is  the  speed  of  sound.  This  has  solutions 


(23) 


(24) 

(25) 


(26) 


ru  +  av  = 


0 

±a 


(27) 


5 


Now  let 


3  = 

r 

U  = 

u 

9 

L  J 

V 

J 

so  that  Equation  (27)  becomes 


(28) 


Since  s  has  unit  norm,  the  non-sero  solution  will  exist  only  if  the  flow  is  supersonic.  So  far,  no  restrictions  have 
been  placed  on  the  derivative  operators  a*  and  sv.  The  analysis  above  applies  to  the  exact  derivative  operators 
as  well  as  any  of  the  discrete  operators.  The  next  section  introduces  the  discrete  equations  and  their  solution. 


5  Fourier  Analysis  of  the  Linearized  Equations 


This  section  introduces  the  spatial  discretisations  of  the  equations  into  the  linear  model,  and  discusses  the 
consequences  of  the  truncation  error  in  the  approximations.  Many  of  the  ideas  used  here  can  be  found  in  [IS], 
but  those  analyses  were  performed  for  a  scalar  problem  involving  only  one  spatial  direction  and  time. 


For  purposes  of  analysis,  assume  that  the  equations  are  discretised  on  a  Cartesian  Nx  x  Ny  mesh  with  grid 
spacings  in  the  x  and  y  directions  of  Ax  and  Ay.  Let  x  =  /Ax  and  y  =  fcAy,  then  assume  the  state  vector  is  of 
the  form 

U(jAx,kAy)  =  £  exp*(/Ax4>m  +  kAy9n)U'mn  (29) 

m=0  n= 0 

where  <j>m  and  8„  are  spatial  frequencies  in  the  x  and  y  directions  and  U'mn  is  some  eigenvector.  The  spatial 
frequencies  are  related  to  m  and  n  by  the  relations 


4m 

9n 


2  irm 

~n7 

2  xn 


(30) 

(31) 


Now  consider  a  model  problem  in  which  Ax  =  1,  Ay  =  A  and  v  <C  u,  and  u  =  Ma.  Then  Eq.  (27)  has  the 
solution 

51  =  ±i.  (32) 


yjsx2  +  iy2 


M 


For  a  particular  choice  of  spatial  discretisation,  there  is  a  particular  dispersive  character  for  a  given  Mach  number 
M.  Table  5  shows  sx  and  sv  for  the  Galerkin,  cell-vertex  and  central  difference  methods,  as  well  as  the  exact 
spatial  derivative,  assuming  that  4  and  9  are  continuous  rather  than  discrete.  Now  introduce  =  sx  and 
S]  =  /Ra„,  square  Eq.  (32)  and  solve  for  si/ss  to  obtain 


—  =  AVM2  ~  1.  (33) 

S3 

This  representation  of  the  dispersion  relation  has  the  properties  that  si  and  S3  are  functions  only  of  the  non- 
dimensional  spatial  frequencies  4  and  9,  and  all  the  problem  and  grid  dependent  terms  are  contained  in  the 
quantity  &\JM2  —  1,  which  will  be  called  k.  Problems  with  similar  values  of  k  should  have  similar  dispersive 
behavior. 


One  can  obtain  useful  information  from  these  plots  of  9  vs.  4-  The  slope  of  a  curve  on  which  k  is  constant 
is  the  spatial  ‘group  velocity*,  or  the  angle  at  which  waves  propagate.  Waves  with  large  spatial  group  velocity 
(the  angle  on  the  6/4  plot  is  close  to  vertical)  will  travel  at  a  shallow  spatial  angle  (the  wave  will  move  a  long 
way  in  x  for  a  little  change  in  y).  This  allows  one  to  predict  where  the  dispersed  waves  will  appear.  Figure  4 
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Method 

3x/i 

/RSy/t 

Exact  Derivative 

<t> 

9 

Galerkin 

i  gin  <fr(2  +  cos  9) 

£  sin  0(2  + cos  <t>) 

Cell- Vertex 

5  sin  <^(1  +  cos  9) 

|  sin  0(1  +  cos<£) 

Central  Difference 

sin  <p 

sin0 

Table  1:  Spatial  Derivative  Operators  for  Various  Methods 


shows  the  contours  of  constant  x  for  the  exact  spatial  derivative  operator.  Note  that  the  lines  of  constant  x 
are  straight,  indicating  that  all  frequencies  travel  at  the  same  angle.  Moreover,  for  /R  =  1,  the  waves  have 
angle  tan_1(l/\/Af3  —  1)  =  sin-1(l/A/j,  which  is  just  the  Mach  angle.  This  is  expected  since  we  considered  the 
linearised  Euler  equations. 

Figure  5  shows  the  contour  for  the  Galerkin  method.  Note  that  the  curves  are  multiple-valued.  For  a  given 
4>  there  may  be  more  than  one  value  of  0,  or  no  values  of  9.  This  indicates  that  there  are  certain  frequencies 
which  do  not  propagate  with  real  velocities.  Practically,  the  second  value  is  at  a  high  spatial  frequency  which 
will  be  killed  off  by  the  artificial  damping  in  the  scheme.  The  other  noteworthy  thing  about  Fig.  5  is  that  the 
curves  depart  from  the  exact  Euler  solution  much  later  than  all  the  other  methods.  This  is  due  to  the  fact  that 
on  a  uniform,  Cartesian  mesh,  the  Galerkin  method  is  fourth-order  accurate  for  the  linearised  Euler  equations. 
In  practice,  one  never  sees  this  fourth-order  accuracy,  for  three  reasons.  First,  the  artificial  viscosity  introduces 
some  error  into  the  solution  scheme.  Lindquist  has  shown  [16]  that  artificial  viscosity  can  have  a  dominant  effect 
on  the  solution  error.  Second,  the  grids  used  to  solve  problems  are  very  seldom  Cartesian.  On  a  mesh  composed 
of  anything  other  than  congruent  parallelograms,  fourth-order  accuracy  can  no  longer  be  obtained.  Third,  the 
flux  calculations  for  the  spatial  discretisation  introduce  some  error.  These  effects  combine  to  make  the  Galerkin 
scheme  second-order  accurate  for  practical  problems. 

Figure  6  shows  the  dispersion  plot  for  the  central  difference  method.  Note  that  the  character  of  the  diagram 
is  similar  to  the  Galerkin  plot.  One  would  expect  the  dispersive  behavior  to  be  similar  to  the  Galerkin  dispersive 
behavior,  and  to  some  extent,  this  is  the  case. 

Figure  7  shows  the  dispersion  curves  for  the  cell-vertex  scheme.  Note  that  the  curves  are  single-valued.  Also 
note  that  the  curvature  is  opposite  the  curvature  for  the  Galerkin  and  central  difference  methods.  For  a  particular 
choice  of  x,  the  dispersion  curve  for  the  cell-vertex  method  will  lie  on  the  opposite  side  of  the  exact  dispersion 
line  than  the  curves  for  the  Galerkin  and  central  difference  methods.  This  implies  that  the  oscillations  due  to 
dispersion  at  a  feature  (a  shock,  for  example)  should  appear  on  the  opposite  side  (ahead  or  behind)  of  the  feature 
compared  to  the  Galerkin  and  central  difference  oscillations. 

An  important  application  of  these  curves  is  the  prediction  of  oscillations  due  to  discontinuities  such  as  shocks. 

In  some  problems,  oscillations  before  or  after  a  shock  can  cause  the  solution  algorithm  to  diverge.  For  example,  in 
a  strong  expansion,  a  post-expansion  oscillation  may  drive  the  pressure  negative,  while  a  pre-expansion  oscillation 
may  not  be  harmful.  The  dispersion  curves  allow  one  to  predict  the  location  of  these  oscillations  and  choose  a 
solution  algorithm  which  will  place  them  in  a  safe  place.  The  location  of  oscillations  may  be  predicted  by  the 
following  rule:  If  the  9  vs.  ^  curve  is  concave  up,  the  oscillations  will  be  behind  the  feature  (they  travel  faster 
than  the  exact  solution),  and  if  the  curve  is  concave  down,  the  oscillations  will  be  ahead  of  the  feature.  For  the 
cell-vertex  method,  this  means  that  one  will  see  pre-feature  oscillations  for  x  >  1  and  post-feature  oscillations  for 
x  <  1.  For  the  Galerkin  and  central  difference  methods,  this  is  reversed:  x  <  1  implies  pre-feature  oscillations, 
and  x  >  1  implies  post-feature  oscillations. 
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6  Numerical  Verification 


To  verify  that  the  theory  above  is  the  correct  explanation  of  the  oscillations  observed  in  many  problems, 
several  numerical  experiments  were  made.  All  test  problems  were  for  flow  over  a  wedge  in  a  channel  with  various 
wedge  angles,  inflow  Mach  numbers  and  mesh  aspect  ratios.  Figure  3  shows  the  geometry  and  flow  topology  for 
the  10  degree  wedge,  M*,  —  2  case  discussed  below.  All  the  calculations  were  performed  on  50x20  grids,  and 
result  in  similar  flow  topologies. 

The  first  set  of  experiments  is  for  a  1/2  degree  wedge  angle,  with  k  =  1.732.  Figure  8  show  the  curves  for  all 
three  numerical  methods  and  the  exact  spatial  derivatives  on  a  single  plot  for  this  value  of  k.  Here  it  is  apparent 
that  the  Galerkin  curve  stays  much  closer  to  the  exact  curve.  Three  numerical  test  cases  were  run:  Mach  2  flow 
with  /4J  =  1;  Mach  1.323  flow  with  /R  =  2  and  Mach  3.606  flow  with  /R  =  1/2.  A  quick  examination  of  the  flow 
geometry  gives  the  physical  significance  of  k  as  the  ratio  of  the  number  of  x  grid  lines  crossed  by  the  feature  per 
y  grid  line  crossed.  In  the  Mach  3.606  flow,  the  shock  lies  at  a  much  shallower  angle,  so  that  for  a  smaller  Ay  the 
same  crossing  ratio  is  obtained.  A  similar  argument  holds  for  the  Mach  1.323  flow. 

Figure  9  shows  the  Mach  number  at  mid-channel  for  the  central  difference  method,  scaled  by  the  free  stream 
Mach  number  for  the  different  Mach  numbers  above.  The  central  difference  method  is  used  here  because  it  exhibits 
the  most  oscillation  with  the  greatest  amplitude.  The  exact  Mach  number  ratios  (M/Moo)  for  these  shocks  are 
0.991  for  Moo  =  2  and  M0 0  =  3.606  and  0.986  for  Moo  ==  1.323.  These  compare  well  with  the  actual  data,  and 
explain  why  two  of  the  curves  lie  on  top  of  each  other.  Note  that  the  frequencies  of  the  oscillation  are  nearly 
identical.  Also  note  that  the  frequency  changes  slightly  as  one  moves  further  downstream  of  the  shock.  This  is 
as  predicted  by  the  dispersion  curve.  As  one  moves  downstream  the  spatial  group  velocity  increases,  meaning  <f> 
increases  slightly.  The  wavelength  predicted  by  the  dispersion  relation  at  (x,  y)  =  (1.5, 0.5)  should  be  about  10.5 
points,  and  the  measured  wavelength  (crest-to-crest)  is  either  10  or  11  points,  depending  on  where  one  defines 
the  crest. 

The  next  set  of  data  shows  the  location  of  the  oscillations  for  the  Mach  number  2  case  with  the  three  methods. 
In  all  the  figures  shown,  the  plot  is  of  Mach  number  at  mid-channel.  Figure  10  shows  the  plot  for  the  Galerkin 
method,  Figure  11  for  the  central  difference  method  and  Figure  12  for  the  cell-vertex  method.  Note  that  both  the 
Galerkin  and  central  difference  methods  exhibit  post-shock  oscillation,  while  the  cell-vertex  exhibits  pre-shock 
oscillation.  Also  note  that  the  frequency  of  the  Galerkin  oscillations  is  much  higher,  and  with  a  lower  amplitude 
than  the  central  difference  approximation.  This  is  expected  since  the  Galerkin  method  group  velocity  errors  occur 
at  higher  spatial  frequencies,  (see  Fig.  8).  As  an  interesting  aside,  note  that  in  Fig.  12  the  pre-shock  oscillations 
from  the  reflected  shock  are  visible  at  the  right  side  of  the  plot.  These  figures  verify  the  use  of  the  dispersion 
curves  to  predict  the  location  of  dispersive  phenomena. 

The  final  test  cases  show  the  Mach  2  flow  over  a  10  degree  wedge,  which  generates  a  shock  wave  with  a  normal 
Mach  number  of  1.27  and  a  density  ratio  of  1.46.  This  case  was  chosen  because  the  problem  starts  to  become 
significantly  nonlinear.  Figure  13  shows  the  Mach  number  on  a  slice  at  a  5  degree  angle  to  th=  x  axis  for  the 
Galerkin  method  and  Fig.  14  shows  the  same  data  for  the  cell-vertex  method.  In  these  cases,  k  is  about  1.7  ahead 
of  the  first  shock,  and  k  is  about  0.6  behind  the  reflected  shock  (due  to  the  lower  Mach  number  of  1.28  and  the 
changing  aspect  ratio  of  the  cells).  Note  that  the  oscillation  positions  in  Fig.  14  are  correctly  predicted  to  be 
before  the  first  shock,  where  k  >  1,  and  after  the  second  (reflected)  shock,  where  the  k  <  1.  Note  also  that  the 
frequency  of  the  oscillations  has  increased.  This  may  be  due  to  the  tendency  of  the  nonlinear  characteristics  to 
point  into  the  shock,  tending  to  make  the  shock  sharpe:.  This  self-sharpening  behavior  has  almost  completely 
eliminated  the  dispersive  error  in  the  Galerkin  case,  but  in  Fig.  13  small  low-frequency  oscillations  are  still  visible 
after  the  first  shock  and  before  the  second  shock.  In  many  practical  applications,  the  analyst  is  concerned  with 
the  location  (pre-  or  post-shock)  more  than  the  frequency,  and  the  linearised  analysis  is  still  useful  for  those 
applications. 
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7  Conclusions 


The  primary  conclusion  of  this  study  is  that  the  low  frequency  oscillations  sometimes  seen  near  shocks  are 
due  to  dispersion  in  the  numerical  scheme.  The  linearised  analysis  presented  gives  one  a  method  for  predicting 
the  location  and  frequency  of  these  oscillations.  The  linear  analysis  is  effective  in  predicting  the  location  of 
oscillations,  even  for  problems  with  significant  nonlinearity.  The  central  difference  finite  element  method  is  shown 
to  be  inferior  to  the  Galerkin  and  cell-vertex  methods  due  to  its  poor  dispersive  behavior.  The  Galerkin  finite 
element  is  shown  to  be  fourth-order  accurate  for  uniform  meshes  and  has  the  lowest  dispersive  error.  The  cell- 
vertex  method  is  shown  to  be  second-order  accurate  for  any  parallelogram  mesh  and  has  moderate  dispersive 
error.  For  the  practical  analyst,  either  Galerkin  or  cell-vertex  provides  adequate  performance. 
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Figure  3:  Geometry  for  10  Degree  Wedge  Numerical  Test  Case,  /^»1 
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Figure  12:  Mid-channel  Mach  Number,  M <*,  =  2,  1/2  Degree  Wedge,  Ceil- Vertex  Method 


Figure  13:  Mid-channel  Mach  Number,  Moo  —  2,  10  Degree  Wedge,  Galerkin  Method 


Figure  14:  Mid-channel  Mach  Number,  Moo  =  2,  10  Degree  Wedge,  Cell- Vertex  Method 
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1  Abstract 

An  implementation  of  Jameson’s  popular  cell  cen¬ 
tered  finite  volume  scheme  is  presented  and  ana- 
l'..ed  for  its  computational  requirements.  Timings 
are  given  on  several  computers  and  compared  to 
LINPACK  benchmarks.  The  program  has  been  an¬ 
alyzed  to  characterize  generic  computational  con¬ 
structs.  Three  simple  loops  are  identified  and  have 
been  tested  on  two  parallel  machines,  an  ALLIANT 
FX/8  and  an  Intel  iPSC  hypercube.  Each  loop 
has  different  floating  point  and  memory  reference 
features.  The  most  demanding  loop,  a  tridiagonal 
solver,  requires  a  special  algorithm  to  perform  well 
on  the  hypercube.  It  is  suggested  that  the  CFD 
community  select  some  standard  loops  for  use  in 
characterizing  the  computational  workloads  inher¬ 
ent  in  most  algorithms. 

2  INTRODUCTION 

Computational  Fluid  Dynamics  (CFD)  involves  the  nu¬ 
merical  solution  of  systems  of  nonlinear  partial  differen¬ 
tial  equations  representing  the  conservation  laws  of  mass, 
momentum,  and  energy  applied  to  a  fluid  medium.  Fi¬ 
nite  difference,  finite  volume,  or  finite  element  methods  are 
generally  utilized  to  cast  the  differential  equations  into  a 
discrete  formulation  suitable  for  solution  on  a  computer. 
Due  to  the  nonlinear  nature  of  the  equations,  an  iterative 
solution  is  essential.  A  variety  of  approaches  are  available, 
but  all  require  large  computer  memory  and  many  floating 
point  operations  to  achieve  a  solution. 

CFD  is  one  field  which  has  driven  the  development  of 
supercomputers.  In  the  future,  CFD  must  exploit  the  po¬ 
tential  gains  from  parallel  processors  in  order  that  compu¬ 
tational  simulations  for  complex  problems  can  be  achieved 
within  realistic  times  and  costs.  Many  of  the  algorithms 
used  in  CFD  are  amenable  to  almost  complete  parallel  com¬ 
putation,  yet  the  benchmarks  of  large  codes  often  produce 
disappointing  results.  Most  CFD  algorithms  bear  little  re¬ 


semblance  to  LINPACK  which  is  commonly  used  to  rep¬ 
resent  scientific  computing  workloads.  At  present  there  is 
too  little  communication  between  the  computer  scientists 
capable  of  designing  high  performance  parallel  computers 
and  the  CFD’ers  desiring  to  use  such  machines.  In  part, 
this  may  be  due  to  the  complexities  of  the  CFD  algorithms 
being  used.  If  simpler  descriptions  were  available,  the  com¬ 
munication  and  progress  might  improve. 

In  this  paper,  a  representative  problem  from  one  branch 
of  CFD  is  examined.  The  particular  problem  is  the  com¬ 
putation  of  compressible  airflow  past  a  transonic  transport 
wing  using  the  Euler  equations.  A  popular  explicit  finite 
volume  algorithm  due  to  Jameson  and  Baker  [lj  as  coded 
by  Roberts  [2]  is  considered.  Timings  are  given  on  several 
computers  for  a  calculation  of  the  three-dimensional  flow 
past  the  wing.  The  code  has  been  analyzed  to  characterize 
generic  computational  structures.  From  this,  three  short 
loops  are  identified  which  represent  the  range  of  compu¬ 
tational  tasks  embedded  in  the  program.  The  loops  are 
representative  of  computational  structures  in  many  other 
CFD  codes  utilizing  structured  grid  algorithms  and  there¬ 
fore  should  be  of  general  interest.  These  loops  have  been 
tested  on  two  machines  representing  different  architectures 
and  the  results  examined  for  parallel  efficiencies.  One  ma¬ 
chine,  an  ALLIANT  FX/8,  has  up  to  8  computational  el¬ 
ements  (CE’s)  with  a  shared  central  memory  and  a  high 
speed  cache.  Each  CE  is  a  vector  processor,  and  the  com- 
plier  handles  concurrency  in  one  of  several  ways.  The  other 
machine,  an  Intel  iFSC,  has  up  to  32  nodes,  each  with  its 
own  memory,  connected  in  a  hypercube  topology.  The  par¬ 
ticular  model  used  in  this  study  is  an  early  and  not  very 
powerful  one,  but  our  main  interest  was  to  test  parallel 
efficiencies,  not  actual  MFLOPS. 

3  THE  ALGORITHM 

In  this  section  a  brief  account  is  given  of  the  underlying 
equations  and  finite  volume  algorithm  for  which  the  tim¬ 
ings  are  given  and  from  which  the  representative  loops  are 
extracted. 


3.1  Governing  Equations 


3.2  Discretized  Equations 


The  three-dimensional  unsteady  Euler  equations  in  con¬ 
servation  form  are: 
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with  the  definition  of  total  enthalpy 

Ho  =  E  +  (2) 

and  the  equation  of  state  for  a  perfect  gas 


P  =  (-7  ~  1) 


u*  +  v1  +  ui* 
pE-p - - - 


(3) 


closing  the  set.  -7  is  a  constant  equal  to  1.4  for  air.  The 
equations  are  commonly  written  in  vector  form: 

^U  +  ^F(U)  +  |.G(U|  +  |h(U)=0.  (4) 


U  is  termed  the  state  vector  and  represents  the  5  depen¬ 
dent  variables  of  the  problem.  The  components  of  the  flux 
vectors  F,G,H  along  with  the  pressure  p  can  be  evaluated 
from  the  components  of  U. 

The  equations  may  be  integrated  over  an  arbitrary  vol¬ 
ume  V  in  space  bounded  by  a  surface  designated  as  dV  with 
an  outward  normal  vector  n  as  shown  in  Figure  1.  Using 
the  divergence  theorem  yields  the  integral  conservation  law 


Figure  1:  Control  volume 
form  of  the  equations 

^///udV  +  //(F,G,H)-fl<M  =  0  (5) 

v  tv 

stating  that  the  time  rate  of  change  of  mass,  momentum, 
and  energy  in  the  volume  V  is  equal  to  the  integrated  flux 
of  the  mass,  momentum,  and  energy  across  the  boundaries 
of  V,  If  the  integrated  fluxes  equal  aero,  the  solution  is 
termed  ‘steady*.  Appropriate  boundary  conditions  must 
be  added  to  represent  the  body  and  flight  conditions,  but 
iheae  will  not  be  discussed  here. 


The  basic  solution  scheme  is  a  finite  volume  spatial  dis¬ 
cretization  with  a  multi-stage  integration  in  time  as  devel¬ 
oped  by  Jameson  and  Baker  [lj  which  forms  the  basis  of 
the  popular  benchmark  code  FL057.  The  particular  im¬ 
plementation  given  here  is  due  to  Roberta  [2]  and  the  exact 
coding  may  somewhat  different. 

The  three-dimensional  volume  surrounding  a  wing  is 
filled  with  a  mesh  system,  illustrated  in  Figure  2,  consisting 
ot  I  x  J  x  K  =  N,  hexahedral  ceils.  The  arbitrary  volume 
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Figure  2:  Mesh  system 

V  is  then  specialized  to  a  typical  cell  as  shown  in  Figure  3, 
and  the  integral  equation  replaced  by  the  semidiscrete  dif¬ 
ference  equation 

rfUi,A  * 

Vi>k~ir  =  -  Z(FA' +  +  H4*)<  =  (6) 

£=1 

where  At ,  A,,  A,  are  the  projected  areas  in  the  *,  y,  z  direc¬ 
tions  of  each  cell  face.  The  flux  vectors  F,G,H  are  com¬ 
puted  as  simple  averages  between  the  value  in  cell  i,j,k 
and  its  six  neighbors  1  ±  1  ,j  ±l,k±l. 

The  process  of  computing  the  residual  R4,*  can  be  done 
several  ways,  but  the  basic  procedure  in  Roberts  code  is  as 
follows: 

1.  loop  over  ail  cells  to  compute  p,,yt*  from  the  current 
value  of  Uj,t, 

2.  set  Rijt  =  0  for  all  cells, 

3.  set  up  the  boundary  conditions, 

4.  loop  over  all  faces  along  the  »  direction  and  compute 
the  products  of  the  following  form: 


(7) 


SY 


2Z 


Figure  3:  Finite  volume  cell 

with  similar  terms  for  GA,  and  HA,,  adding  the  re¬ 
sults  to  R,  y,»  and  subtracting  it  from 

5.  repeat  the  same  operations  for  all  faces  in  the  j  di¬ 
rection,  and  then 

6.  finish  with  all  faces  in  the  k  direction,  and  multiply 
the  final  result  by  j. 

Virtually  all  CFD  algorithms  used  in  the  aerospace  field 
have  to  compute  the  residual  by  a  similar  procedure. 

The  above  discretization  requires  the  addition  of  smooth¬ 
ing  or  filter  or  artificial  dissipation  terms  to  damp  out  high 
frequency  oscillations  and  to  capture  shock  waves.  Typi¬ 
cally  these  smoothing  terms  involve  second  and  fourth  dif¬ 
ference  operators  in  the  three  computational  coordinate  di¬ 
rections.  The  details  of  these  operators  will  not  be  covered 
here.  The  computational  structure  is  similar  to  the  resid¬ 
ual  calculation  given  above  except  that  the  difference  sten¬ 
cil  also  includes  t  ±  2,  j  ±2,k  ±2.  Otherwise  the  process 
of  looping  over  faces  with  add  and  multiply  operations  is 
similar.  Let  the  added  dissipation  operator  be  denoted  by 
D,,». 

This  process  creates  a  large  system  (5  x  JV.)  of  coupled 
nonlinear  ordinary  differential  equations  for  the  state  vector 
U.  To  integrate  these  equations  in  time,  the  multistage 
scheme  of  Jameson  it  Baker  [lj  is  used.  Assuming  that 
everything  is  known  at  time  level  n,  the  values  for  U  at 
time  level  n  ■+•  1  are  obtained  from 

U<°)  *  U", 

U(l)  =  Uw  -  ai  Ar  (r(0)  -  D(0))  , 

U*1*  =  U(0)  -  ajAr  (rW  -  D(0!)  ,  (8) 

U<*>  *  U(0)  -  crjAr  (R(,)  -  D(0))  , 

U<4>  =»  U(0)  -  A r  (R<*>  -  D(0))  , 

U">1  =  U(4) 


where  the  subscripts  «,/,  k  have  been  omitted  for  clarity. 
ai(aii  0‘s  we  constants,  and  Ar  =  |r.  V,lk  is  computed 
once  during  initialization  and  At,y *  is  computed  at  the  be¬ 
ginning  of  each  multistage  iteration  using  a  formula  from 
linear  stability  analysis.  For  time  accurate  calculations,  At 
is  set  to  the  minimum  over  all  cells.  The  artificial  smooth¬ 
ing  is  evaluated  only  at  the  initial  stage  of  the  temporal 
integration  to  reduce  the  operation  count  for  the  scheme. 

Most  applications  of  this  algorithm  are  for  problems 
with  a  time  invariant  solution  (steady  state),  and  it  is  ad¬ 
vantageous  to  use  various  procedures  to  accelerate  the  con¬ 
vergence.  In  essence,  time  becomes  only  an  iteration  pa¬ 
rameter  and  intermediate  solutions  are  of  no  physical  in¬ 
terest.  The  simplest  procedure  is  to  use  a  different  At 
in  each  cell  corresponding  to  the  maximum  value  allow¬ 
able  from  linear  stability  theory.  Another  procedure  is 
“enthalpy  damping"  which  adds  certain  forcing  functions 
to  the  state  vectors  at  the  end  of  each  time  step.  These 
forcing  functions  are  zero  at  convergence.  Implicit  residual 
smoothing  may  also  be  used  by  replacing  a  typical  stage  of 
the  multistage  scheme 

U<*>  =  U(0)  +  a*$U  (9) 

by 

U<‘>  =  U(0)  +  a*<5U"  (10) 

where  tfU™  is  computed  from 

(l  -  tS*i)  (l  -  (6'i)  (l  -  e6\)  6 U"  =  dU.  (11) 

The  6 *  are  second  difference  operators  so  that  the  above 
equation  represents  the  product  of  three  scalar  tridiagonal 
matrix  equations  in  the  three  coordinate  directions  solved 
by  three  steps 
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(14) 

Tridiagonal  inversions  also  enter  CFD  codes  for  factored 
implicit  algorithms  of  the  Beam-Warming  (ADI)  class  [3), 
where  they  may  be  either  scalar  or  block  tridiagonal.  Fi¬ 
nally,  multigrid  methods  may  be  employed. 

3.3  Computational  Resources 

Table  I  shows  the  floating  point  operation  count  and 
memory  references  for  computing  U"+l„*  given  U",,*  for 
a  single  cell.  It  can  be  seen  that  about  1500  floating  point 
operations  are  required  including  about  60  divide  opera¬ 
tions.  The  number  of  memory  references  (load  and  stores) 
can  vary  depending  upon  how  a  particular  machine  deals 
with  temporaries.  If  a  given  variable  is  used  more  than 
once  within  a  single  loop  but  is  retained  in  a  register,  then 
about  1100  toad/stores  are  required.  However,  if  that  vari¬ 
able  has  to  be  refetched  from  memory  each  time  it  is  used, 


Subroutine 

FLOP 

MR 

FLOP/MR 

Flux  Sum 

672 

440-716 

1.53-1.06 

Dissipation 

432 

171-411 

2.52-1.05 

Pressure 

48 

24 

2 

Time  Integ 

99 

113 

.88 

Enthalpy  Damp 

32 

11 

2.91 

Res  Smoothing 

200 

335 

.60 

Total 

1483 

1094-1610 

1.35-.92 

Table  I  -  Floating  Point  Operations  (FLOP)  and  Memory 
References  (MR)  per  cell  per  timestep. 

then  about  1600  load/stores  are  needed.  Since  accessing 
memory  is  usually  a  bottleneck,  this  is  an  important  con¬ 
sideration. 

It  can  be  seen  that  most  loops  have  about  one  memory 
reference  for  each  floating  point  operation.  Loops  in  which 
the  flux  vectors  or  the  pressure  are  computed  from  the  state 
vector,  and  the  enthalpy  damping  loop,  have  l.S  to  3  times 
as  many  floating  point  operations  as  memory  references. 
On  the  other  hand,  the  tridiagonal  solvers  in  the  residual 
smoothing  have  50  %  more  memory  references  than  floating 
point  operations. 

For  10*  cells,  a  moderate  level  of  resolution,  about  2  X 
10*  time  steps  are  needed  to  reach  convergence  requiring  a 
total  of  3  x  1011  floating  point  operations.  About  35  32  bit 
words  of  memory  are  needed  per  cell  for  a  total  memory 
requirement  of  14  megabytes. 

4  TIMINGS 

Table  II  shows  the  performance  of  the  above  algorithm 
without  the  enthalpy  damping  and  residual  smoothing  rou¬ 
tines  for  several  computers.  96  x  20  x  20  =  38, 400  cells,  a 
medium  level  grid,  were  used  for  the  test  case.  Halving  the 
grid  in  each  direction  to  give  307,200  cells  would  be  needed 
for  an  truly  accurate  solution  [4j.  The  calculations  were 
run  for  100  time  steps  to  eliminate  the  set-up  overhead. 
Compiler  directives  were  used  as  needed  on  each  machine, 
and  all  but  one  loop  can  be  both  vectorized  and  run  con¬ 
currently  without  recoding.  32  bit  precision  is  adequate  for 
these  calculations,  but  the  actual  precision  which  was  used 
is  noted. 

For  the  ALLIANT  calculations,  good  parallel  efficiency 
was  achieved  up  to  5  processors  running  in  the  concurrent- 
outer,  vector-inner  (COVI)  mode.  The  lower  efficiency  for 
8  CE’s  could  be  due  to  two  reasons.  Since  the  outer  loop 
is  normally  of  length  20,  using  8  CE's  leaves  half  the  CE’s 
idle  on  the  final  concurrent  cycle.  Secondly,  the  memory 
access  may  have  saturated  with  the  full  8  CE’s  running. 
The  timings  are  for  the  ALLIANT  CE’s  shipped  in  1988. 


Machine 

Config 

Code 

MFLOPS 

EFF 

Scalar(32) 

.2 

N/A 

FX/8 

1  CE 

Scalar(32) 

.9 

N/A 

1  CE 

Vector(32) 

2.2 

1.0 

3  CE 

COVI(32) 

5.7 

.85 

5  CE 

COVI(32) 

9.6 

.86 

8  CE 

COVI(32) 

11.8 

.66 

CY  205 

2  Pipes 

8.7 

N/A 

17.0 

N/A 

Vector(64) 

48.0 

N/A 

Vector(32) 

93.0 

N/A 

XMP/48 

1  Proc 

Vector (64) 

62.0 

N/A 

IBM  3090 

1  Proc 

Scalar(32) 

10.1 

N/A 

1  Proc 

Vector(32) 

24.8 

1.0 

2  Proc 

COVT(32) 

46.3 

.94 

3  Proc 

COVI(32) 

60.8 

.82 

4  Proc 

COVI(32) 

81.4 

.82 

5  Proc 

COVI(32) 

95.0 

.76 

6  Proc 

COVI(32) 

91.7 

.62 

Table  II  -  Timings  (MFLOPS)  and  parallel  efficiencies 
(EFF)  on  various  computers. 

For  the  newly  announced  advanced  CE’s  a  20  %  increase 
in  MFLOPS  was  achieved  with  8  ACE’s. 

The  CYBER  205  calculations  were  run  on  the  John  von 
Neumann  center  at  Princeton  University.  The  VAST  II  vec- 
torizer  produced  only  about  twice  the  speed  of  the  scalar 
version.  However,  a  hand-vectorized  [5]  version  which  re¬ 
quired  about  3  months  of  human  time  achieved  very  good 
performance,  particularly  with  half  precision  (32  bit).  The 
timing  on  the  NASA  Ames  CRAY  XMP/48  is  for  a  sin¬ 
gle  processor.  The  code  originally  written  by  Roberts  was 
optimized  for  this  machine. 

Finally,  the  IBM  timings  predate  the  recently  released 
concurrent  compiler  for  the  3090  and  were  done  by  inserting 
fork/join  statements  by  hand.  Some  additional  temporary 
variables  were  defined  to  avoid  multiple  divide  operations 
which  the  compiler  didn’t  optimize  out.  Good  efficiency 
was  achieved  up  to  5  processors,  with  the  performance  for 
the  6  processor  configuration  dropping  off,  presumably  due 
to  the  same  reasons  noted  for  the  ALLIANT. 


Machine 

Config 

EULER 

LIN(64) 

LIN(32) 

mVAX  II 

FPA 

.20 

.13 

.17 

FX/8 

1  CE 

2.2 

1.6 

1.6 

8  CE 

11.8 

7.6 

7.6 

CY  205 

2  Pipes 

17.0 

17.0 

XMP/48 

1  Proc 

62.0 

39.0 

IBM  3090 

1  Proc 

24.8 

12.0 

13.0 

Table  III  -  Timings  (MFLOPS)  for  present  code  EULER2 
and  LINPACK  in  full  (64)  and  half  (32)  precision. 
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unity  up  to  32  nodes  since  the  loop  is  local  in  nature  and 
requires  no  internodal  traffic  except  for  setup.  Each  node 
ran  at  0.041  MFLOPS. 


5.2  Loop  2 


The  second  loop  is  the  flux  summation  given  by  Equa¬ 
tion  (6)  for  a  two  space  dimension  problem.  This  loop  is 
representative  of  computational  tasks  in  forming  the  resid¬ 
ual  R  and  dissipation  D  vectors  given  the  values  of  the 
state  vector,  the  flux  vectors,  and  the  projected  areas.  The 
loop  involves  many  add-multiply  operations  which  are  a 
commonly  occurring  construct  in  many  algorithms.  The 
loop  has  60  add/subtract,  36  multiplies,  4  stores  and  49- 
100  loads  depending  upon  the  ability  of  the  processor  to 
keep  temporaries  local.  A  complete  description  of  the  loop 
and  the  results  is  given  by  Modiano  [7],  and  only  the  results 
will  be  reported  here. 


Finally,  it  is  interesting  to  compare  the  timings  for  the 
present  code  with  those  of  LINPACK(6|  in  full  and  half 
precision  as  given  in  Table  IQ.  In  some  cases,  the  timings 
are  comparable,  while  in  others  they  can  differ  by  a  factor 
of  two.  It  was  assumed  that  the  LINPACK  numbers  on  the 
CYBER  205  were  obtained  with  the  VAST  U  vectorizer. 

5  CFD  LOOPS 

Three  loops  have  been  selected  which  represent  the  range 
of  constructs  embedded  in  the  algorithm  described  in  Sec¬ 
tion  3.  The  loops  have  been  coded  and  run  on  the  AL- 
LIANT  and  the  Intel  iPSC  to  highlight  performance  param¬ 
eters.  All  calculations  have  been  done  with  32  bit  words. 
Listings  of  the  actual  FORTRAN  code  are  not  included  here 
as  it  is  felt  these  results  are  still  preliminary  and  should  be 
discussed  within  the  CFD  community  before  they  become 
adopted  as  standard  test  cases. 

5.1  Loop  1 

The  first  loop  is  painfully  simple  and  consists  of  com¬ 
puting  the  pressure  in  all  the  cells  from  Equation  (2)  given 
the  components  of  the  state  vector  U  =  [p,  pu,  pv,pw,  pE\T . 
This  loop  also  represents  the  operations  needed  to  compute 
the  flux  vectors  F,  G,  H.  'j  -  l  is  a  constant  computed  and 
stored  at  the  beginning  of  the  calculation.  By  defining  tem¬ 
porary  variables  for  u,v,w  only  one  divide  is  needed.  For 
each  cell,  there  are  9  floating  point  operations,  5  fetches  and 
1  store.  There  are  no  dependencies  which  inhibit  parallel 
and  vector  calculation,  and  the  calculation  is  completely 
local  in  that  no  data  is  required  from  neighboring  ceil’;.  It 
is  a  reasonable  test  of  raw  computing  speed  for  a  given 
machine. 


64  x  64  x  64 

32  x  32  x  64 

CE’s 

MFLOPS 

EFF 

MFLOPS 

EFF 

1 

2.5 

1.6 

2.5 

1.0 

2 

4.7 

■” 

4.7 

.93 

4 

8.6 

to 

00 

8.8 

.86 

8 

14.9 

.74 

15.2 

.75 

Table  IV  -  Timings  for  Loop  1  on  ALLIANT  FX/8. 

Table  IV  shows  the  results  for  this  loop  on  the  AL¬ 
LIANT  FX/8.  It  is  interesting  to  note  that  even  for  a  loop 
this  simple  efficiencies  significantly  lower  than  unity  are  re¬ 
alized  due  either  to  overhead  in  controlling  8  different  CE's 
and/or  saturation  of  the  memory  bus.  It  is  also  interest¬ 
ing  to  see  that  the  overall  megaflop  performance  is  close  to 
that  realized  for  the  complete  code  shown  in  Table  II.  The 
timings  on  the  Intel  hypereube  show  parallel  efficiencies  of 


Scalar 

Vector 

CE’s 

MFLOPS 

EFF 

MFLOPS 

EFF 

1 

1.33 

1.00 

3.47 

1.00 

2 

2.63 

.985 

6.84 

.985 

4 

5.22 

.978 

13.1 

.948 

8 

10.2 

949 

23.5 

.849 

Table  V  -  Timings  for  Loop  2  on  ALLIANT  FX/8  in 
scalar  and  vector  mode. 

Table  V  presents  the  results  on  the  ALLIANT  FX/8 
for  a  64  x  64  grid.  A  study  was  done  on  3  CE’s  with  a 
variable  number  of  grid  points  and  no  change  in  perfor¬ 
mance  was  noted  until  the  main  memory  was  exceeded.  In 
scalar  mode,  very  good  efficiencies  are  obtained,  while  in 
vector  mode  some  loss  of  efficiency  is  experienced  with  8 
CE’s.  This  may  simply  be  due  to  saturation  of  the  memory 
bus.  The  peak  speed  obtained  of  23.5  MFLOPS  represents 
25  %  of  the  theoretical  maximum  of  94.4  MFLOPS.  The 
higher  MFLOP  numbers  for  this  loop  compared  to  loop  1 
are  probably  due  to  the  absence  of  divides,  the  frequency 
of  add-multiply  triads  which  are  a  vector  instruction,  and 
the  reuse  of  variables  in  cache. 


The  ALLIANT  has  a  high  speed  cache  between  the  CE’s 
and  the  global  memory.  The  global  memory  is  logically  di¬ 
vided  into  blocks  corresponding  to  the  size  of  the  cache,  and 
an  address  location  in  cache  is  associated  with  the  corre¬ 
sponding  location  within  each  block  of  the  memory.  When 
a  load  request  is  made,  the  cache  is  first  examined  to  de¬ 
termine  if  the  variable  is  present.  If  not,  a  block  of  8  32  bit 
words  including  the  requested  variable  is  moved  to  cache. 
If  the  next  variable  requested  by  the  processor  is  adjacent 
to  the  previous  one,  chances  are  it  is  already  in  cache  and 
does  not  need  to  be  fetched  from  main  memory  (a  cache 


-5- 


NJ 

NK 

65  1  :29 

257 

513 

1025 

65 

3.44 

8.25 

3.57 

2.51 

1.62 

129 

8.47 

6.08 

3.15 

3.12 

1.38 

257 

7.44 

5.17 

2.50 

1.91 

1.521 

513 

7.23 

3.07 

2.65 

2.30 

1.64 

1025 

7.14 

5.03 

i— _ 

3.24 

2.33 

Table  VI  -  Timings  for  Loop  2  on  3  CE  ALLIANT  FX/8 
with  variable  indexing  of  arrays,  COVI. 

hit).  However,  one  can  imagine  the  worst  case  of  storing 
data  in  main  memory  such  that  each  request  overwrites  the 
previously  moved  data  and  there  are  never  any  cache  hits. 
To  illustrate  this,  the  same  64  x  64  problem  was  done  on 
3  CE's,  but  the  size  of  array  dimension  was  varied  from  65 
to  1025  in  each  of  the  j,  k  indices.  Table  VI  presents  the 
results  and  illustrates  that  in  some  circumstances,  a  high 
performance  penalty  can  be  paid.  Spreading  the  data  out 
in  main  memory  in  the  j  direction  (inner  variable)  resulted 
in  a  sharp  decrease  in  the  realized  MFLOP  performance 
from  8.42  MFLOPS  to  as  small  as  1.38  MFLOPS.  This  ex¬ 
ample  illustrates  the  need  for  application  programmers  to 
pay  attention  to  architectural  issues  and  for  the  manufac¬ 
turers  to  to  clearly  explain  the  performance  principles  of 
each  machine. 

Loop  2  was  run  on  the  Intel  iPSC  hypercube  using  do¬ 
main  decomposition.  Since  the  interprocessor  communi¬ 
cation  time  is  much  longer  than  access  time  for  the  local 
memory  on  each  node,  as  much  data  as  possible  should  be 
kept  local.  By  partitioning  the  computational  domain  into 
subdomains  and  putting  one  on  each  node,  only  a  limited 
amount  of  boundary  information  data  needs  to  be  trans¬ 
ferred  between  nodes.  A  significant  amount  of  syntax  must 
be  added  to  the  code  to  handle  initialization  and  message 
passing,  but  the  computational  strategy  is  simple. 


32  x  32  =  1024  cells 

64  x  64  =  8196  cells 

Nodes 

MFLOPS 

EFF 

MFLOPS 

EFF 

-  — 

.055 

1.00 

2 

.055 

.995 

4 

.054 

.986 

8 

.053 

.968 

.055 

.996 

16 

.051 

.925 

.054 

.989 

32 

.042 

.755 

.054 

.979 

Table  VII  -  Timings  for  Loop  2  on  Intel  iPSC.  MFLOPS 
is  effective  performance  per  node. 


Table  VII  presents  the  timing  results  which  show  that 
as  long  as  each  computational  node  has  enough  cells  so  that 
the  boundary  data  transmission  is  minor,  excellent  parallel 
efficiency  can  be  achieved.  The  aggregate  MFLOP  rate  is 
obtained  by  multiplying  by  the  number  of  nodes. 

5.3  Loop  3 

The  third  loop  which  has  been  tested  consists  of  the 
three  scalar  tridiagonal  matrix  inversions  for  the  implicit 
residual  smoothing  given  by  Equations  11  -  14.  A  detailed 
report  of  the  approaches  and  listing  of  the  FORTRAN  pro¬ 
grams  for  doing  this  on  the  two  processors  is  given  by 
Haimes,  Giles,  and  Mur  mam  [8|.  Only  an  outline  of  the 
approach  and  summary  of  results  will  be  presented  here. 


_ 1 

64  x  64  x  64 

32  x  32  x  64 

iEESX 

MFLOPS 

EFF  | 

MFLOPS 

EFF 

i  j 

1.4 

■Ell 

1.2 

1.0 

2 

2.6 

.94 

2.3 

.96 

B9I 

4.5 

mm\ 

3.0 

KM 

6.8 

.60 

4.0 

Table  VIII  -  Timings  for  Loop  3  on  ALLIANT  FX/8  in 
vector  mode. 

For  the  ALLIANT,  the  familiar  Thomas  algorithm,  a 
Gauss  elimination  procedure  for  a  sparse  matrix,  was  used. 
A  straightforward  application  with  the  inner  loop  being  the 
forward-backward  elimination  step,  leads  to  dependencies 
which  inhibit  both  vectorization  and  concurrency.  This  can 
be  avoided  by  making  the  outer  loop  the  elimination  direc¬ 
tion  and  the  inner  two  loops  the  other  directions.  Addi¬ 
tional  storage  is  then  required  for  temporary  variables,  but 
the  loop  vectorizes  and  runs  parallel.  Table  VIII  presents 
the  results  for  the  ALLIANT.  Due  to  the  fact  that  there  is 
a  divide  operation  and  considerable  memory  traffic  (see  Ta¬ 
ble  I),  the  overall  performance  and  efficiency  are  much  less 
than  the  other  two  loops,  and  there  is  also  a  performance 
penalty  for  the  smaller  problem,  probably  due  to  poor  use 
of  the  cache. 

For  the  Irtel  hypercube,  am  application  of  the  tradi¬ 
tional  Thomas  procedure  will  lead  to  terrible  performance 
as  there  is  no  way  to  organize  the  data  which  avoids  signifi¬ 
cant  internode  traffic  in  at  least  two  of  the  three  directions. 
A  special  algorithm  had  to  be  devised  which  divided  each 
line  into  subdomains.  The  forward  solve  is  done  in  each 
subdomain,  temporarily  treating  the  variables  adjacent  to 
the  dividing  locations  as  known  parameters.  This  leads  to 
a  reduced  problem  for  these  variables  which  can  be  quickly 
solved,  and  the  backward  substitution  is  done.  The  algo¬ 
rithm  is  fully  explained  in  Haimes  et  al  [8j.  About  twice  the 
number  of  operations  are  required  compared  to  the  stan- 
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dard  Thomas  algorithm,  and  the  coding  complexity  is  much 
greater,  but  e-  silent  parallel  efficiency  is  achieved  on  a  dis¬ 
tributed  memory  architecture.  For  a  64  x  64  x  64  grid,  51.2 
seconds  were  required  on  a  32  node  configuration,  while 
reducing  the  number  of  grid  points  by  a  factor  of  4  and 
running  on  3  nodes  yielded  50.6  seconds. 

The  tridiagonal  solver  is  the  most  involved  of  the  three 
loops,  yet  is  a  common  algorithm  for  many  applications. 
The  performance  on  the  ALLIANT  could  undoubtedly  be 
improved  by  careful  blocking  of  the  problem  size  to  stay 
within  the  cache  size.  On  the  hypercube,  good  perfor¬ 
mance  can  be  achieved  with  a  considerable  expenditure  of 
programming  effort.  Both  of  these  lead  to  the  observation 
that  a  standard  utility  should  be  provided  on  each  machine 
to  free  the  applications  programmer  from  needing  to  be  an 
expert  on  these  subjects. 


6  UNSTRUCTURED  GRIDS 


Machine  j  Config 

Code 

FEM 

LIN(32) 

pV AX  II 

FPA 

Scalar 

.14 

.17 

FX/8 

1  CE 

3  CE 

Vector 

Vector- 

Concur. 

1.2 

3.0 

1.6 

IBM  3090 

1  Proc 

Vector 

7.5 

13.0 

7 

The  results  reported  in  this  paper  are  considered  as  a 
preliminary  step  to  provide  information  needed  to  assist  in 
the  design  and  evaluation  of  parallel  processor  computers 
for  CFD  applications.  A  popular  CFD  algorithm  has  been 
analyzed  to  illustrate  the  computational  requirements,  and 
timings  have  been  compared  on  several  machines  and  with 
LINPACK  benchmarks.  Three  loops  have  been  identified 
which  represent  typical  computational  workloads  of  this  al¬ 
gorithm.  They  also  represent  generic  tasks  in  most  CFD 
algorithms.  The  loops  have  been  tested  on  two  parallel 
processors  to  illustrate  the  parallel  efficiencies.  It  would  be 
helpful  if  the  CFD  community  could  decide  on  several  stan¬ 
dard  loops  to  assist  in  the  design  and  evaluation  of  future 
parallel  computers. 
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Table  DC  -  Timings  (MFLOPS)  for  Finite  Element  Code 
and  LLNPACK  in  half  (32)  precision. 


Currently  a  number  of  investigators  are  developing  CFD 
algorithms  using  unstructured  grids  for  which  i,j,k  index¬ 
ing  may  not  be  used.  Such  approaches  either  utilize  tetra- 
hedra  or  hexahedra,  and  often  employ  adaptative  grid  ap¬ 
proaches  wherein  cells  are  subdivided  during  the  iterative 
solution.  Unstructured  grids  require  connec*:vity  matri¬ 
ces  to  identify  which  node  points  are  at  the  vertices  of  the 
cells.  This  “indirect  addressing”  requires  gather-scatter  op¬ 
erations  which  generally  lead  to  a  reduced  computational 
throughput.  Table  EX  presents  timings  on  three  computers 
for  a  finite  element  method  which  is  similar  to  the  algorithm 
outlined  in  Section  3,  but  which  has  an  unstructured  grid. 
Details  of  the  algorithm  are  given  by  Shapiro  and  Murman 
[9].  It  can  be  seen  that  throughput  on  both  the  ALLIANT 
and  IBM  3090  are  considerably  different  with  the  unstruc¬ 
tured  grids  compared  to  the  structured  grids.  This  points 
to  the  need  for  a  loop  of  this  variety  in  order  to  highlight 
the  features  of  such  computational  tasks. 


-7- 


References 


[1]  A.  Jameson  and  T.J.  Baker,  Solution  of  the  Euler  Equa¬ 
tions  for  Complex  Configurations,  AIAA-83-1929CP, 
July  1983. 

[2|  T.W.  Roberts,  Euler  Equation  Computations  for  the 
Flow  Over  a  Hovering  Helicopter  Rotor ,  PhD  thesis, 
Massachusetts  Institute  of  Technology,  November  1986, 
Also  MIT  CFDL  TR-86-5. 

(3j  R.  Beam  and  R.  Warming,  ‘An  Implicit  Factored 
Scheme  for  the  Compressible  Navier-Stokes  Equations,* 
AlAA  Journal,  Vol.  16,  1978,  pp.  393-103. 

[4j  A.  Rizzi,  ‘Three-Dimensional  Solutions  to  the  Euler 
Equations  with  One  Million  Grid  Points,"  AIAA  Jour- 

*  nal,  Vol.  23,  1985,  pp.  1986-1987. 

[5|  Robert  E.  Malecki,  Euler  Equation  Calculations  for  a 
Cropped  Delta  Wing  Using  the  CYBER  SOS,  Master’s 
thesis,  Massachusetts  Institute  of  Technology,  August 
1986,  Also  MIT  CFDL  TR-86-4. 

[6|  Jack  J.  Dongarra,  Performance  of  Various  Computers 
Using  Standard  Linear  Equations  Software  in  a  FOR¬ 
TRAN  Environment,  Technical  Report  Tech  Memo  No. 
23,  Argonne  National  Laboratory,  October  1987. 

[7]  David  Modiano,  Performance  of  a  Common  CFD 
Loop  on  Two  Parallel  Architectures,  Technical  Re¬ 
port  CFDL-TR-87-11,  MIT,  1987. 

[8]  Robert  Haimes,  Michael  Giles,  and  Earll  Murman,  Per¬ 
formance  of  Implicit  Residual  Smoothing  on  Multipro¬ 
cessor  Machines,  Technical  Report  CFDL-TR-88-2, 
MIT,  February  1988. 

[9]  R.  Shapiro  and  E.  Murman,  Adaptive  Finite  Element 
Methods  for  the  Euler  Equations,  AIAA-88-0034,  Jan¬ 
uary  1988. 


I 


APPENDIX 


Lecture  Notes  in  Physics,  Vol  323 
tors:  D  .  L  .  Dwyer ,  M.Y.Hussaini  &  R.G.  Voigt 
Springer-Verlag  1989,  pp  326-330 


11th  INTERNATIONAL 
CONFERENCE  ON  NUMERICAL 
METHODS  IN  FLUID  DYNAMICS 


Unsteady  and  Turbulent  Flow  using  Adaptation  Methods 

John  G.  Kallinderis*  and  Judson  R.  Baron* 

Computational  Fluid  Dynamic*  Lab,  Dept,  of  Aeroaautics  and  Astronautics 
Massachusetts  Institute  of  Technology,  Cambridge,  MA  02139 

INTRODUCTION 

Adaptive  grid  embedding  has  proven  to  be  an  efficient  approach  for  resolving  important  flow 
features.  The  method  was  originally  applied  to  inviscid  flow  involving  shocks  as  the  main  feature 
[3,2].  It  has  recently  been  extended  to  include  viscous  regions  with  multiple  scale  phenomena  such 
as  shocks  and  boundary  layers  [4j.  There  the  directional  embedding  concept  was  introduced  in 
regions  where  significant  flow  gradients  exist  only  in  one  direction  (e.g.  boundary  layers).  Equation 
adaptation  also  was  used  to  limit  the  Navier  Stokes  equations  to  only  appreciably  viscous  regions, 
while  for  the  remaining  areas  the  Euler  equations  were  solved.  Lastly,  a  new  finite  volume, 
conservative  scheme  was  developed  to  discretize  the  viscous  terms  [4],  while  retaining  the  inviscid 
terms  discretization  due  to  Ni[6]. 

The  present  work  extends  the  previous  adaptation  methodology  and  numerical  scheme  to  such 
complex  flows  as  airfoils  in  turbulent  flow.  The  detection  procedure  tracks  both  flow  features  and 
their  directionality,  and  defines  embedding  patches  which  act  as  ‘filters’  to  reduce  the  number  of 
‘randomly’  embedded  cells.  Also  a  new  and  general  way  of  implementing  an  algebraic  turbulence 
model  for  unstructured  grids  (quadrilaterals  or  triangles,  the  former  emphasized  here)  is  described 
and  its  accuracy  is  evaluated.  For  unsteady  simulations,  it  is  shown  how  one  may  allow  a  spatial 
variation  of  the  time  steps  while  simultaneously  maintaining  time  accuracy.  The  flow  field  is 
divided  into  temporal  levels  which  coincide  with  the  embedding  levels.  The  accuracy  and  efficiency 
of  the  method  is  examined  for  a  forced  oscillation  model  problem.  The  interface  treatment  is 
essential  for  an  accurate  and  robust  procedure.  Therefore  a  Conservative  interface  treatment 
is  presented  and  compared  with  a  nonconservative  treatment  for  steady  state  problems  in  both 
subsonic  and  supersonic  flow. 

In  the  following  the  concept  of  embedding  patches,  the  Baldwin- Lomax  turbulence  model  im¬ 
plementation  for  unstructured  meshes,  the  method  of  spatially  varying  time  steps,  and  finally,  a 
conservative  interface  treatment,  are  all  described  and  evaluated. 

EMBEDDING  PATCHES 

An  important  consideration  when  constructing  embedded  grids  relates  to  the  determination 
of  a  suitable  threshold  of  the  feature  detection  parameter (s).  Additional,  extraneous  cells  may 
appear  near  features  and  are  essentially  embedded  ‘noise  cells’.  Alternatively,  cells  which  should 
be  divided  often  are  overlooked  [3].  This  characteristic  behavior  becomes  even  more  severe  with 
directional  embedding. 
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An  embedded  patch  encloses  a  defined  fixed  number  of  cells  of  the  initial  mesh.  It  essentially 
scans  the  domain  and  during  the  scan  the  included  cells  are  examined  for  each  patch.  If  the 
majority  of  its  cells  (e.g.  90%)  are  flagged  for  division,  then  all  cells  currently  belonging  to  the 
patch  are  flagged  for  division.  Conversely,  if  very  few  cells  (e.g.  10%)  are  flagged,  none  within  the 
patch  are  embedded.  When  repeated  throughout  the  domain,  the  patch  procedure  can  be  viewed 
conceptually  as  a  ‘searching  window’,  and  acts  somewhat  like  a  ‘noise  filter’  which  reduces  the 
number  of ‘randomly’  embedded  cells. 

The  illustrative  case  of  a  flow  around  an  NACA  0012  airfoil  shows  the  effectiveness  of  the 
method.  Fig.  5(a)  shows  Mach  number  contours  just  before  a  second  level  of  embedding.  The 
dominant  features  are  a  normal  shock  on  the  suction  side  and  the  upper  and  lower  surface  boundary 
layers.  Fig.  5(b)  shows  the  resulting  grid.  Directional  embedding  is  present  above  and  below  the 
airfoil  while  the  leading  edge  region  is  embedded  in  both  directions.  The  downstream  region  lying 
between  the  two  shear  layers  is  not  embedded.  Comparison  of  the  grid  and  the  resulting  solution 
shows  that  both  the  features  and  their  directionality  are  faithfully  captured.  The  borders  between 
the  different  regions  are  fairly  free  from  ‘noise’  cells. 

TURBULENCE  MODELING  WITH  UNSTRUCTURED  MESHES 

The  algebraic  model  due  to  Baldwin  and  Lomax  [1]  was  used  as  a  turbulent  flow  description. 
The  model  implicitly  assumes  a  structured  mesh,  and  its  implementation  is  usually  along  lines 
normal  to  the  surface.  For  an  unstructured  mesh  (quadrilateral  or  triangular),  such  normal  mesh 
lines  generally  do  not  exist.  In  the  case  of  quadrilaterals,  for  example,  interfaces  interrupt  such 
lines  (Fig.  1). 

Our  approach  implements  the  model  in  a  ‘cell-wise’  manner.  All  necessary  quantities  are 
calculated  at  the  center  of  each  cell.  In  this  way  we  avoid  using  information  from  outside  of  the  cell, 
an  approach  which  is  common  when  dealing  with  unstructured  meshes  generally.  For  example, 
vorticity  which  is  an  important  quantity  for  both  the  inner  and  outer  layer  formulation  of  the 
model,  is  calculated  using  Green’s  theorem  over  each  cell.  Specifically,  ui  =  —  ( 1  / SeM )  fetu ( uda  + 
vdy)  where  Setu  is  the  cell  area.  The  distance  of  each  cell  from  the  wall  is  calculated  and  stored 
whenever  the  grid  is  updated.  The  only  quantities  that  require  information  from  outside  of  each 
cell  in  order  to  be  evaluated  are  the  Baldwin-Lomax  parameters  Fmax  and  Udiff  which  are  used 
for  the  outer  layer.  These  are  evaluated  by  examining  all  cells  associated  with  each  station  (cells 
A,B,C,D,E  in  Fig.  1).  The  assumption  is  that  they  exhibit  no  significant  streamwise  variation 
over  cell  E. 

The  case  of  a  NACA  0012  airfoil  in  a  subsonic  turbulent  flow  tests  the  accuracy  of  the  method. 
An  initial  C-mesh  of  33x17  points  with  two  levels  of  embedding  was  used  for  the  calculation.  In 
Fig.  4  the  pressure  coefficient  distribution  and  the  lift  are  compared  favorably  to  experiment  [8]. 

UNSTEADY  FLOW  WITH  SPATIALLY  VARYING  TIME  STEPS 

The  globally  minimum  time  step  required  for  an  unsteady  calculation  poses  a  serious  limitation 
to  the  application  of  explicit  methods.  We  suggest  a  procedure  which  allows  a  spatial  variation  of 
the  time  steps  while  simultaneously  maintaining  time  accuracy.  To  accomplish  this  the  flow  field 
is  divided  into  temporal  levels  which  coincide  with  the  embedding  levels  (Fig.  2).  The  same  time 
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step  is  assigned  at  each  level  but  between  two  consecutive  levels  the  time  steps  vary  by  a  factor 
of  two.  For  the  example  in  Fig.  2,  the  smaller  cells  of  level  1  are  integrated  in  time  twice  using  a 
time  step  St  before  the  larger  cells  of  level  0  are  integrated  using  a  time  step  26 1  [5]. 

Spatial  adaptation  (cell  subdivision  to  resolve  spatial  gradients)  generally  results  in  temporal 
adaptation  (reduction  of  the  time  step  to  resolve  temporal  gradients)  as  well  [7].  Large  temporal 
gradients  may  exist  in  regions  where  features  are  present.  However,  a  special  algorithm  for  de¬ 
tection  of  regions  of  these  temporal  gradients  is  frequently  an  unnecessary  burden  since  the  latter 
are  included  in  the  spatially  embedded  regions.  This  results  in  a  significant  simplification  of  the 
data  structure  required  to  handle  adaptation. 

As  a  test  of  the  time  accuracy  of  the  method,  we  consider  the  case  of  a  channel  flow  with 
a  forced  oscillation  of  the  inlet  Mach  number  (Fig.  6).  A  low  Reynolds  number  (Be  —  10*) 
reduced  CPU  time.  Using  one  level  of  embedding  the  time  history  of  the  U-velocity  component 
at  a  point  of  the  domain  was  tracked.  Free  stream  flow  was  used  as  initial  condition.  The  initial 
transient  decayed  after  approximately  2  periods.  Comparison  of  the  solution  with  that  for  a 
globally  embedded  mesh  shows  the  agreement  between  the  two  histories  to  be  excellent  (Fig.  6). 

CONSERVATIVE  INTERFACE  TREATMENT 

The  existence  of  embedded  regions  within  the  interior  of  the  domain  introduces  internal  bound¬ 
aries  (interfaces)  which  must  be  treated  carefully.  We  present  a  conservative  interface  treatment 
and  compare  it  with  one  that  is  non-conservative  for  steady  state  Sows  in  both  supersonic  and 
subsonic  cases. 

The  inviscid  terms  are  evaluated  by  performing  a  special  line  integration  at  cell  E  (Fig.  3b) 
which  includes  the  hanging  node  b.  The  cell  change  in  time  for  the  Ni  scheme  is  distributed 
to  all  5  nodes  of  cell  E  as  shown  in  Fig.  3a.  The  viscous  terms  require  piecewise  integration 
of  the  stresses  along  the  dashed  lines  (Fig.  3b)  on  each  cell  [4].  The  interface  cell  E  is  divided 
into  five  areas  which  are  allocated  to  each  node.  The  stress  fluxes  then  cancel  inside  the  cell 
and  the  treatment  becomes  conservative.  Due  to  stretching  in  the  cells,  there  is  an  error  in  the 
viscous  terms’  evaluation  which  is  locally  induced.  The  distributions  to  the  five  nodes  due  to  the 
smoothing  terms  (fourth  and  second  order)  sum  up  to  zero  which  makes  the  smoothing  operator 
conservative.  This  conservative  and  a  non-conservative  treatment  [4]  have  been  compared  in  the 
following  supersonic  and  subsonic  model  cases. 

Interfaces  near  a  Shock.  Channel  flow  with  a  Af  =  1.35  shock  was  used  as  a  test  with  em¬ 
bedding  near  the  shock  (Fig.  8).  The  embedded  region  is  enclosed  by  the  crosses.  In  Fig.  9 
we  compare  mass  flow  across  the  channel  for  the  two  treatments  and  for  a  globally  fine  mesh 
without  interfaces.  The  nonconservative  treatment  clearly  induces  a  mass  flow  error,  while  the 
conservative  one  reproduces  the  same  result  as  the  globally  fine. 

Interfaces  inside  a  Boundary  Layer.  The  same  geometry  in  a  subsonic  flow  (Af  =  0.5)  and 
with  embedding  at  midheight  inside  the  boundary  layer  was  used  to  test  the  conservative  and 
nonconservative  treatments.  Fig.  7  shows  skin  friction  distributions  for  both  the  embedded  and 
the  globally  fine  grids.  In  this  case  the  non-conservative  interface  treatment  provides  excellent 
agreement  with  the  globally  fine  solution.  An  interpretation  is  that  the  stretching  error  in  the 
conservative  treatment  is  more  important  than  the  non-conservation  error. 
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CONCLUSIONS 

Concept*  for  adaptive  procedures  have  been  examined.  Feature  detection  is  made  more  ac- 
curate  by  the  introduction  of  embedded  patches  which  act  as  ‘noise  filters’  in  the  embedding 
procedure.  A  method  to  implement  the  Baldwin-Lomax  algebraic  turbulence  model  when  applied 
to  unstructured  meshes  proves  to  be  accurate.  Spatially  varying  time  steps  based  on  embedding 
zones  can  be  used  for  unsteady  problems  leading  to  accurate  results  with  reduced  CPU  times. 
Lastly,  a  conservative  interface  treatment  is  important  in  regions  where  shocks  exist,  but  not  nec¬ 
essarily  inside  boundary  layers  where  the  stretching  error  in  the  evaluation  of  the  viscous  terms 
may  be  dominant. 
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Figure  1:  Cell  group  at  a  Station 
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Figure  2:  Spatially  varying  time  steps 
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PRELIMINARY  SUMMARY  OF  TURBULENT  SPOT  EXPERIMENT 

Three  stages  of  transition  have  been  observed  and  reported  on  earlier . 

In  the  first  stage  the  wave  packet  can  be  treated  as  a  superposition  of 

all  two-  and  three-dimensional  waves  according  to  linear  stability  theory, 

and  most  of  the  energy  then  is  centered  around  a  wave  corresponding  to  the 

one  that  is  most  amplified.  In  the  second  stage,  most  of  the  energy  is 

transferred  to  waves  which  are  centered  around  the  wave  having  half  the 

frequency  of  the  most  amplified  linear  mode.  During  this  stage  the 

amplitude  of  the  wave  packet  increases  from  0.5%  to  5%  of  the  free  stream 

velocity  (Uq) .  In  the  final  stage  a  turbulent  spot  develops,  and  the 

amplitude  of  the  disturbance  increases  to  27%  of  U  . 

o 

The  transition  process  that  was  obtained  in  the  present  experiment  is 
demonstrated  in  Figure  1.  On  the  left  of  the  figure  the  time  evolution 
of  the  velocity  fluctuations  in  the  stream  direction  are  shown;  on  the 
right  their  associated  power  spectra  are  plotted  versus  the  nondimens ional 
frequency  3.  These  were  measured  along  the  centerline  and  inside  the 
boundary  layer  where  the  mean  velocity  is  35%  of  the  free  stream  velocity. 
The  results  are  shown  for  three  down-stream  stations  (XD)  representing  the 
three  stages  of  transition  mentioned  above.  Thus,  increasing  the  initial 
amplitude  of  the  wave-packet  did  not  result  in  any  substantial  change  of 
the  transition  process  except  for  the  fact  that  the  entire  process  was 
advanced  further  upstream.  The  linear  stage  covers  the  down-s bream 
distance  where  nonlinear  interactions  between  various  modes  of  the  wave- 
packet  are  insignificant.  Therefore,  increasing  the  initial  amplitude 
level  of  the  wave-packet  results  in  shortening  of  the  linear  stage.  Con¬ 
sequently  the  subharmonic  and  the  spot  formation  stages  have  been  accom¬ 
plished  earlier  at  X  =  230  cm  and  300  cm  respectively,  relative  to 
X  =  280  cm  and  X  =  350  cm  respectively  in  previous  experiments. 

The  corresponding  2-D  spectra  are  shown  in  Figure  2.  They  were  ob¬ 
tained  by  measuring  the  u  and  w  velocity  components  in  the  X-Z  plane. 

These  2-D  spectra  indicate  again  that  when  the  initial  amplitude  level 
of  the  wave-packet  is  increased,  its  temporal-spatial  structure  remains 
unchanged  through  the  various  stages  of  transition  except  for  a  backwards 
shift  in  the  X  direction.  However,  the  dimensional  components  of  the 
wave-packet  do  evolve  differently.  For  example,  the  dimensional  fre¬ 
quencies  of  the  waves  which  are  undergoing  resonance  in  the  subharmonic 
stage  are  different  between  the  two  cases  and  do  depend  on  the  initial 
amplitude  level. 

For  3-dimensional  disturbances  the  linearized  equations  of  motion 
reduce  to  a  set  of  two  equations:  (see  also  Benney  and  Gustavsson, 

Studies  in  Applied  Mathematica  64:105-209,  1981) 


i  a  Re 
x 


U 


H  =  i  a  Re  U'  v 
z 


(2) 


(3) 


u  =  —  (a  dv  -  a  n) 
2  x  z 

k 


w  =  —  (a  dv  +  a  n) 
2  z  x 

k 


where 


D  -  d/dy 

Equation  1  is  the  familiar  homogeneous  Orr-Sommerfeld  equation,  for 
the  vertical  velocity  v.  The  equation  for  the  vertical  vorticity,  n»  is 
inhomogeneous,  and  its  homogeneous  part  has  a  different  eigenvalue  operator. 
While  the  structure  of  v  depends  only  on  the  solution  of  the  Orr-Sommerfeld 
equation,  the  structure  of  u  and  w  depends  on  both  equations.  Therefore, 
one  must  be  careful  about  the  interpretation  of  Squire's  theorem.  In 
other  words,  it  is  correct  that  the  2-dimensional  wave  is  the  one  that  is 
most  amplified,  and  that  it  is  the  most  energetic  wave  of  the  v  velocity 
component.  However,  it  is  not  necessarily  true  that  the  2-dimensional 
mode  is  the  most  energetic  wave  of  the  u  velocity  component.  This  point 
is  demonstrated  in  Figure  3 .  The  2-D  power  spectra  measured  at  X  =  170  cm 
(linear  stage)  and  at  a  Y  location  where  U/U  is  0.5  are  shown.  At  the 
top  of  the  figure  the  2-D  power-spectra  of  the  u  and  v  velocity  components 
measured  with  X-wire  are  shown.  At  the  bottom  of  the  figure  the  2-D  power- 
spectra  of  the  u  and  w  velocity  components  measured  with  V-shaped  wire  are 
shown.  It  is  evident  that  while  the  most  energetic  mode  of  the  v  velocity 
component  is  a  2-dimensional  wave,  the  most  energetic  mode  of  the  u  velocity 
component  is  in  fact  a  3-dimensional  wave. 

At  three  downstream  stations  representing  the  linear,  subharmonic  and 
spot  formation  stages,  measurements  of  all  3  velocity  components  were  made, 
mapping  the  entire  Y-Z  plane.  At  each  height  Y  from  the  plate  a  double 
Fourier  transform  in  time  and  spanwise  direction  was  made,  and  therefore 
the  Y  distribution  for  each  mode  was  obtained.  The  distributions  for  the 
most  amplified  two-dimensional  wave  and  its  three-dimensional  subharmonic 
wave  which  has  the  same  phase  velocity  are  shown  in  Figure  4  for  the  three 
downstream  locations.  The  profiles  of  the  streamwise  and  vertical  velocity 
fluctuations  for  the  two-dimensional  wave  are  plotted  on  the  left,  while 
the  profiles  corresponding  to  the  three-dimensional  subharmonic  wave  are 
shown  on  the  right.  The  triangular  symbols,  representing  the  calculated 
data  points,  are  compared  with  the  solutions  of  the  linear  stability  theory 
given  by  the  solid  lines.  The  theoretical  calculations  are  compared  with 
experiments  by  equating  the  areas  under  each  curve  of  the  streamwise 
fluctuation  only.  The  agreement  between  the  theoretical  calculations  and 
the  data  is  fairly  good  for  the  linear  and  subharmonic  stages,  but  is  poor 
for  the  spot  formation  stage . 
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ABSTRACT 

We  review  an  embedded  adaptation  algorithm  to  illustrate  applications  and  requirements 
for  Navier-Stokes  and  non-equilibrium,  unsteady,  two-dimensional,  Sow  fields.  Both  grid  refine¬ 
ment  and  equation  modification  are  included  for  unstructured  distributions  of  quadrilateral  cells. 
Multiple  refinement  parameters  for  distinct  physical  phenomena  serve  to  monitor  the  evolving 
nonuniform  field  and  to  recognize  flow  features  in  isolated  and  overlapping  subdomains.  Model¬ 
ing  time  accurate  behavior  involves  tracking  of  features  through  a  space/time  domain;  turbulent 
modeling  requires  care  for  an  unstructured  nonuniform  mesh;  directional  grid  refinement  con¬ 
forms  to  feature  geometry.  Examples  from  subsonic,  hypersonic,  internal  and  external  fields  show 
substantial  advantages  for  the  adaptive  approach  for  equivalent  numerical  accuracies. 

1.  INTRODUCTION* 

A  concerted  effort  has  led  to  the  development  of  adaptive  codes  that  monitor  flow  fields 
during  computation  and  simultaneously  modify  some  aspect  of  the  numerical  algorithm,  thereby 
achieving  a  required  level  of  accuracy  and  resolution  without  a  priori  knowledge  of  field  details.  We 
specifically  review  an  embedded  adaptation  approach,  and  extensions  to  two-dimensional  Navier- 
Stokes,  finite  rate  chemistry,  and  time  accurate  descriptions,  including  their  special  Euler,  perfect 
gas,  and  steady  state  forms. 

The  motivation  is  that  space  and  time  scales  differ  appreciably  within  a  given  domain.  Typical 
aeronautical  applications  may  involve  shock  and  contact  surface  discontinuities,  stiff  relaxation 
regions  due  to  finite  rate  chemistry,  and  thin  shear  layers  adjacent  to  or  separated  from  surfaces,  as 
flow  features  which  dominate  field  behavior  and  are  demanding  in  terms  of  a  numerical  method’s 
accuracy  and  robustness. 

The  basic  concept  is  that  beneficial  and/or  necessary  changes  to  the  physical/mathematical 
description,  the  mesh,  and  the  integration  algorithm  are  apparent  from  the  evolving  solution 
field.  It  is  assumed  that  reasonable  initial  choices  can  be  made,  and  that  the  algorithm  both 
recognizes  physics  associated  with  individual  scales  and  provides  a  basis  for  the  transition  from  a 
grid  suggested  by  the  configuration  and  domain  to  one  fitting  the  fluid  events.  Changes  involve  the 
precision  of  physical  descriptions,  mathematical  classification,  and  mesh  scale.  Most  frequently 

*  Acknowledgement:  I  am  indebted  to  my  atndents  Dn.  J.F.  Dannenhoffer,  J.G.  Kallinderis,  M.M. 
Pervaia,  and  Lt.  M.  J.  Aftoemia,  for  the  development  of  adaptation  achemea  and  aolutiona.  Sponsors  were 
the  Air  Force  Office  of  Scientific  Research,  NASA  Langley  Research  Center,  and  AFWAL  Flight  Dynamics 
Laboratory. 
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adaptation  has  implied  grid  refinement  so  as  to  reduce  the  truncation  error  or  to  better  resolve  a 
feature  after  identifying  the  locale  and  its  essential  shape. 

Clearly  adaptation  procedures  introduce  their  own  scales.  The  local  insertion  of  fine  cells 
results  in  mesh  scale  discontinuities  both  spatially  and  temporally,  i.e.  interfaces  in  a  subdivided 
domain.  Similarly,  the  effect  of  embedding  is  to  reinitialize  the  field;  this  implies  a  non-unique 
strategy  and  need  for  special  care  for  accurate  transient  evaluations.  Such  interactions  between 
scheme,  procedure  and  influence  has  led  to  several  suggested  zonal,  overlay,  mixed  topology,  and 
hybrid  (mesh  enrichment  and  grid  displacement)  approaches,  e.g.  [1-7],  in  part  overcoming  dis¬ 
tortions  that  arise  for  simple  mesh  redistributions. 


2.  BASIC  INTEGRATOR,  DETECTOR,  AND  ADAPTOR  METHODS 

Ideally  the  integration  scheme  should  be  independent  of  both  the  specific  adaptation  algorithm 
and  the  mesh  structure,  the  former  simply  to  be  portable,  the  latter  to  ensure  grid  refinement  due 
solely  to  physics.  The  basis  here  is  a  Lax-Wendroff,  finite  difference,  finite  volume,  multiple  grid 
accelerator  scheme  [8,9]  that  has  been  modified  from  its  original  perfect  gas  2-D  Euler  form  to 
a  Navier-Stokes  version  [10,11],  and  separately  with  nonequilibrium  conservation  relations  added 
[13-14]. 

The  nonuniform  field  provides  a  measure  of  the  importance  of  specific  physical  phenomena, 
their  recognition,  and  reasonable  parameter  variations  on  which  decisions  may  be  based.  For 
example,  departures  of  density  differences  from  an  average  over  all  cells  prove  quite  useful  in 
adjusting  grid  scales  for  inviscid  compressible  flow  and  discontinuities;  divided  velocity  differences 
are  appropriate  for  shear  layers;  multiple  refinement  parameters  are  necessary  when  different 
physical  features  are  present.  Providing  a  variable  grid  scale  achieves  a  smooth  solution  overall 
[12].  In  all  cases  the  embedding  adds  grids  and  nodes  to  a  fixed  topology,  and  essentially  aligns 
the  refined  region  to  the  feature  contour  without  reducing  resolution  or  accuracy  elsewhere.  In 
doing  so  the  embedding  is  characteristically  either  external  or  internal  to  each  feature  in  accord 
with  the  inequality  ratio  of  feature  and  cell  scales,  lp/tc  <  or  >  1. 

A  number  of  techniques  and  strategies  result  from  the  non-unique  coupling  of  physics,  back¬ 
ground  topology,  initial  coarseness,  and  refinement.  External  adaptation  essentially  captures  a 
feature  and  adjusts  to  the  surrounding  field  according  to  it  own  scale.  An  internal  adaptation 
may  dominate  the  demands  on  data  storage.  In  either  case,  directional  rather  than  isotropic 
subdivision  of  cells  provides  appreciable  savings.  Often  preembedding  is  clearly  advantageous, 
such  as  when  surface  shear  regions  or  moving  discontinuities  are  evident.  Alternatively,  a  nested 
sequence  of  external  adaptation  levels  may  be  imposed  simultaneously  when  a  feature  is  first  rec¬ 
ognized.  Temporal  adaptation  can  be  related  to  Courant  number/spatial  adaptation  constraints, 
but  also  may  be  governed  by  nonequilibrium  time  scales.  The  basis  for  adaptive  sufficiency  may 
be  continued  refinement  until  convergence  results  for  a  global  parameter,  e.g.  lift,  rather  than 
local  residuals,  consistent  with  the  concept  of  capturing  interacting  features  in  sufficient  detail. 
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3.  EXAMPLES  FROM  VISCOUS,  NONEQUILIBRIUM,  AND  HYPERSONIC  FLOWS 

Several  applications  illustrate  the  behavior  of  the  algorithm  when  focused  on  different  physical 
features.  Fig.  1  shows  a  basic  inviscid  example  of  four  adaptation  cycles  carried  out  for  a  NACA- 
0012  airfoil  section  at  M<»  =  0.85  and  1.0°  angle  of  attack  [9].  Adaptive  cycles  were  imposed 
on  convergence  to  specific  residual  levels,  here  2.  x  10— 4,  and  monitoring  of  global  lift  and  drag 
aerodynamic  coefficients  resulted  in  4  cycles. 

Reacting  inviscid  flow  requires  active  refinement  parameters  for  nonequilibrium  variables  as 
well  as  density  and  provides  another  example  of  external  embedding.  Very  stiff  situations  can  be 
associated  with  the  unsteady  case,  such  that  resolution  demands  are  crucial  even  with  implicit 
modeling  of  chemical  source  terms.  Temporal  adaptation  takes  on  special  importance,  and  has 
been  implemented  for  a  shock  tube  (Fig.  2)  to  illustrate  how  the  space/time  adaptation  involves 
embedding  and  later  removal  after  passage  of  the  several  disturbance  regions  [13]. 

A  generalized  schematic  of  the  uncoupled  space/time  grid  refinement  is  also  shown  for  two 
space  dimensions,  with  time  steps  limited  by  either  convective  or  reactive  constraints,  and  is 
the  background  for  the  momentary  grid  that  corresponds  to  a  shock  interacting  with  a  circular 
arc  configuration  (Fig.  2)  in  a  dissociating  oxygen  medium  [14].  The  grid  importance,  and  the 
equivalence  of  an  adapted  and  globally  fine  mesh,  is  evident  from  Fig.  3. 

Viscous  features  require  internal  embedding  and  usually  have  been  preembedded  (clustered) 
when  shear  layer  locations  and  their  extents  are  known.  The  procedural  concern  is  with  unstruc¬ 
tured  cell  independence  for  discreteness  evaluations  of  stress  contributions,  and  with  limitations 
on  structured  concepts  that  me  inherent  in  turbulence  modeling.  Fig.  4  illustrates  adaptation 
applied  to  a  RAE  2822  airfoil  transonic  field  which  includes  attached  and  separated  turbulent 
shear  layers  as  well  as  shocks.  An  assumed  Baldwin- Lomax  model  was  interpreted  in  cell  centered 
fashion  with  adjustments  and  interpolations  to  meet  requirements  along  surface  normals  [12].  The 
grid  in  Fig.  4  shows  the  characteristic  dense  concentration  along  shock,  surface  and  wake  regions, 
the  latter  being  in  accord  with  the  minimum  velocity  locus.  On  the  enlarged  trailing  edge  portion 
in  the  lower  part  of  the  figure  the  superimposed  crosses  indicate  embedded  interface  locations, 
and  the  presence  of  smooth  Mach  number  contours  in  the  multiply  adapted  confined  region  is 
satisfying. 

Another  example  involving  multiple  features  is  that  of  a  hypersonic  shock  layer  (Fig.  5). 
Inviscid,  reacting  flow  is  confined  to  a  thin  domain  exhibiting  both  entropy  and  relaxation  layers 
that  are  not  always  separated  in  space.  The  free  shock  boundary  implies  adaptation  being  applied 
within  a  changing  domain  during  iteration  to  a  steady  state.  Results  for  a  Mach  number  12. 
example  show  the  coarse  grid  refinement  into  a  finer  mesh  downstream  of  the  shock  as  a  result  of 
chemical  relaxation,  and  elsewhere  due  to  rapid  expansions  and  entropy  variations. 


-3- 


4.  REFERENCES 


1.  Dwyer, H.A.,  Kee,R.J.  and  Sanders, B.R.,  Adaptive  Grid  Method  for  Problems  in  Fluid  Mechanics 
and  Heat  Transfer,  J.  AIAA,  18,  1205-1212  (1980). 

2.  Rai,M.M.  and  Anderson, D.  A.,  Application  of  Adaptive  Grids  to  Fluid  Flow  Problems  with  Asymptotic 
Solutions,  AIAA  Paper  81-0114  (1981). 

3.  Berger, M.J.  and  Oiiger,J.,  Adaptive  Mesh  Refinement  for  Hyperbolic  Partial  Differential  Equations, 
NYU  Courant  Institute  Manuscript  NA-83-02  (1983). 

4.  Erlebacher,G.  and  Eiseman,P.R.,  Adaptive  Triangular  Mesh  Generation,  AIAA  Paper  84-1607  (1984). 

5.  Arney.D.C.  and  Flaherty, J.E.,  A  Two-Dimensional  Mesh  Moving  Technique  for  Time  Dependent 
Partial  Differential  Equations,  J.  Computational  Physics,  67,  124-144  (1986). 

6.  ?teger,J.L.,  Dougherty, F.C.  and  Benek,J.A.,  A  Chimera  Grid  Scheme,  Advances  in  Grid  Generation, 
Ghia,  Editor,  ASME  FED  5  (1983). 

7.  Lohner,  R.,  Adaptive  Remeshing  for  Transient  Problems  with  Moving  Bodies,  AIAA  Paper  88-3737 
(1988). 

8.  Ni,R.H.,  A  Multiple  Grid  Scheme  for  Solving  the  Euler  Equations  ,  J.  AIAA,  £ 0 ,  1565-1571  (1982). 

9.  Dannenhoffer.J.D.,  Grid  Adaptation  for  Complex  Two-Dimensional  Transonic  Flows,  M.I.T.  PhD 
Thesis  (1987). 

10.  Kallinderis.J.G.  and  Baron,J.R.,  Adaptation  Methods  for  a  New  Navier-Stokes  Algorithm,  AIAA 
Paper  87-1167  (1987). 

11.  Kallinderis.J.G.  and  Baron, J.R.,  Unsteady  and  Turbulent  Flow  using  Adaptation  Methods,  Proceed¬ 
ings  11th  Inti.  Conf.  on  Numerical  Methods  in  Fluid  Dynamics,  Williamsburg  (1988). 

12.  Kallinderis,J.G.  and  Baron,  J.R.,  Adaptation  Methods  for  Viscous  Flows,  Computational  Methods  in 
Viscous  Aerodynamics,  C.A.  Brebbia,  Editor  (1988). 

13.  Pervaii,M.M.  and  Baron,  J.R.,  Temporal  and  Spatial  Adaptive  Algorithm  for  Reacting  Flows,  Com¬ 
munications  in  Applied  Numerical  Methods,  4>  97-111  (1988). 

14.  Pervais,M.M.,  Spatio-Temporal  Adaptive  Algorithm  for  Reacting  Flows,  M.I.T.  PhD  Thesis  (1988). 

15.  Aftosmis,M.  J.  and  Baron,  J.R.,  Adaptive  Grid  Embedding  in  Nonequilibrium  Hypersonic  Flows,  AIAA 
24th  Thermophysics  Conference,  Buffalo  (1989). 


-4- 


Fig.  1:  NACA-0012  Airfoil,  Inviscid  Flow  at  ... 

Afoo  =  0.95  and  a  =  1  deg.  Top:  Initial 
Coarse  Grid  Middle:  Final,  Four  Adaptations 
Grid  Bottom:  Surface  Mach  Number  Distribu¬ 
tions  (Line, Symbols)  are  Solutions  on  (Initial, 

Final)  Grids. 

-i  i 

Fig.  2  Temporal  Adaptation.  Top:  Shock  Tube  Spatial  Grid  Varia¬ 
tion  with  Time.  Middle :  Schematic  of  Three  Dimensional  Adapted 
Cell  Distribution  in  Space/Time.  Bottom:  Reacting  (Dissociating 
Oxygen)  Flow  for  M  =  2.  Transient  Shock  over  15%  Circular  Arc 


Section.  Grid  and  Density  Contours,  t  =  0.6. 


Fig.  4  RAE  2822  Airfoil,  Turbulent  Boundary  Lay< 
6.5  x  10®,  a  =  2.44  deg.  Top:  Grid  for  3  Adaptation 
Contoun  and  Interface  Markers  for  Separated  Flow 
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1  Abstract 

A  finite  element  algorithm  for  solving  the  steady  Euler 
equations  is  presented.  The  extension  of  this  algorithm  to 
biquadratic  elements  in  two  dimensions  is  discussed,  and  re¬ 
sults  verifying  the  method  are  presented.  Examples  which 
demonstrate  the  computational  savings  realisable  with  bi¬ 
quadratic  elements  are  shown.  A  method  of  degenerating 
a  biquadratic  element  into  4  bilinear  elements  to  improve 
shock  capturing  properties  is  developed.  The  extension  of 
the  algorithm  to  3-D  is  discussed.  Examples  of  flow  in  a 
corner  and  flow  in  a  scramjet  inlet  are  shown. 


2  Introduction 

Numerical  solution  of  the  Euler  equations  describing  the 
dynamics  of  an  invisdd,  compressible,  ideal  gas  are  becom¬ 
ing  an  important  tool  for  the  practicing  aerodynamicist  [1]. 
In  recent  years,  many  methods  suitable  for  unstructured 
grid  have  been  introduced,  including  the  Galerkin  finite  el¬ 
ement  algorithm  [2,3,4]  and  the  cell-vertex  finite  element 
algorithm  [5,6,7].  The  main  advantage  of  the  unstructured 
grid  methods  is  geometric  flexibility.  They  allow  complex 
geometries  to  be  treated  in  a  straightforward  manner,  and 
allow  one  to  use  grid  embedding  with  relative  ease. 

This  paper  explores  two  finite  'dement  ideas  new  to  the 
solution  of  the  steady  Euler  equations.  The  first  idea  is 
the  use  of  higher-order  (biquadratic)  finite  elements  in  two 
dimensions,  and  the  second  idea  is  the  use  of  adaptive,  hex- 
ahedral  finite  elements  in  three  dimensions. 
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A  mesh  of  biquadratic  elements  requires  far  fewer  ele¬ 
ments  for  a  given  accuracy  than  the  a  mesh  of  bilinear  el¬ 
ements.  Biquadratic  finite  elements  have  been  applied  to 
the  Euler  equations  with  limited  success  previously  [8],  but 
we  present  a  formulation  which  produces  excellent  results 
and  significant  computational  savings  over  bilinear  elements 
(with  or  without  adaptation)  for  many  problems.  Prelim¬ 
inary  results  were  reported  in  [4].  These  and  later  exam¬ 
ples  demonstrate  that  the  biquadratic  elements  can  exhibit 
some  mild  oscillation  at  discontinuities.  To  try  and  correct 
the  oscillation  problem,  the  use  of  a  mixture  of  biquadratic 
and  bilinear  elements  in  the  same  problem  by  adaptively 
degenerating  a  single  biquadratic  element  into  four  bilinear 
elements  in  regions  near  discontinuities  was  explored.  Test 
problems  include  channel  flow  over  a  circular  arc  bump, 
flow  in  a  converging  channel  (including  comparisons  with 
the  exact  solution),  and  flow  in  a  scramjet  inlet. 

In  three  dimensions,  adaptation  with  grid  embedding  us¬ 
ing  hexahedral  finite  elements  is  demonstrated.  Previous 
work  on  adaptive  methods  for  the  Euler  equations  using  grid 
embedding  in  two  dimensions  includes  the  work  of  Dannen- 
hofier  [9],  Lohner  [10],  Shapiro  and  Murman  [4],  and  Oden 
[11].  In  three  dimensions,  the  primary  adaptive  methods 
have  been  the  grid  movement  methods  [12],  and  the  grid 
regeneration  methods  [13,14].  To  our  knowledge,  this  work 
is  the  first  use  of  adaptation  by  grid  embedding  for  hexahe¬ 
dral  elements.  Test  examples  showing  flow  in  a  corner  and 
flow  in  a  3-D  scramjet  inlet  will  be  presented. 


3  Solution  Algorithm 

This  section  briefly  describes  the  solution  algorithm  used 
for  the  3-D  trilinear  and  2-D  biquadratic  elements.  For  a 
more  complete  discussion  of  the  algorithm,  see  [15].  The  Eu¬ 
ler  equations  describing  the  flow  of  an  in  viscid,  compressible 
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fluid  in  two  dimensions  can  be  written  in  conservative  form 
as 

p  pu  pv 

d  pu  &  pu2  +  p  8  puv 

dt  pv  dx  puv  dy  pv2  +  p 

pt  L  puho  poho 


p(«),  q(*)  h<*>  are  written 

u<*) 

(5) 

p(*) 

=  X><*>f„ 

(6) 

G<*> 

=  £jvJ«>G,, 

(7) 

where  where  e  is  total  energy,  p  is  pressure,  p  is  density, 
u  and  v  are  the  flow  velocities  in  the  x  and  y  directions, 
and  ho  is  the  total  enthalpy,  given  by  the  thermodynamic 
relation 

h<,  =  e  +  (2) 

P 

In  addition,  one  requires  an  equation  of  state  in  order  to 
complete  the  set  of  equations.  For  an  ideal  gas,  this  can  be 
written 

£  =  (7 -l)[e -!(*’  +  *’)],  (3) 

where  the  specific  heat  ratio  (7  =  1.4)  is  constant  for  all 
calculations  reported. 

It  is  convenient  to  write  the  equations  in  vector  form  as 


dU  dF  9G 
St  +  dx  +  3y 


=  0, 


«) 


where  Ui,  F<,  and  G,  are  the  nodal  values  of  the  state  vec¬ 
tor  and  flux  vectors,  and  N is  the  set  of  interpolation 
functions  for  element  e. 

These  expressions  can  be  differentiated  to  obtain  formulas 
for  the  derivative  of  a  quantity  in  each  element  in  terms 
of  the  nodal  values  of  that  quantity  and  the  geometry  of 
the  element.  In  all  the  following  steps,  the  two-dimensional 
algorithm  will  be  shown  for  simplicity,  with  the  basic  steps 
identical  for  three  dimensions. 


The  expressions  for  the  derivatives  are  substituted  into 
equation  (4)  and  summed  over  all  elements  to  obtain 


*N.- 
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(8) 


where  N,  is  now  a  global  row  vector  of  interpolation  func¬ 
tions,  determined  by  summing  the  interpolation  functions 
for  each  element. 


where  U  is  the  vector  of  state  variables,  and  F  and  G  are 
flux  vectors  in  the  z  and  y  directions,  corresponding  to  the 
vectors  in  Eq.  (1)  above.  To  expand  these  equations  to  3-D, 
a  z-momentum  equation  and  z  flux  term  must  be  added  to 
the  above  equations. 

3.1  Spatial  discretisation 

The  finite  element  approach  to  discretizing  these  equa¬ 
tions  divides  the  domain  of  interest  into  elements  deter¬ 
mined  by  some  number  of  nodes.  In  general,  these  elements 
can  be  of  any  shape.  In  this  paper,  we  have  chosen  to 
use  quadrilateral  elements  in  two  dimensions  and  hexahe- 
dral  elements  in  three  dimensions.  The  algorithm  permits 
the  degeneration  of  quadrilaterals  into  triangles  and  hex- 
ahedra  into  other  shapes,  so  this  is  not  a  major  geomet¬ 
ric  restriction.  Quadrilateral  and  hexahedral  elements  were 
chosen  since  grids  of  these  elements  require  fewer  elements 
for  a  given  number  of  nodes  than  a  comparable  triangu¬ 
lar  or  tetrahedral  grid,  resulting  in  some  storage  and/or 
CPU  savings.  In  the  current  implementation,  9  nodes  are 
used  for  biquadratic  elements  and  8  nodes  are  used  for  3-D 
elements.  The  9-node  element  is  a  tubparametric  element, 
with  the  geometry  interpolated  bilinearly  and  the  fluxes  and 
state  vectors  interpolated  biquadratically.  Figure  1  shows 
the  geometry  of  the  9-node  element,  and  shows  how  the 
physical  coordinates  are  mapped  to  natural  coordinates  at 
the  element  level.  The  spatial  discretisation  method  begins 
with  the  Euler  equations  in  conservative  form  (  Eq.  (4)  ). 
Within  each  element  the  state  vector  U***  and  flux  vectors 


It  is  impossible  to  make  Eq.  (8)  hold  for  all  points  in 
space  (since  the  space  of  interpolation  functions  does  not 
include  all  solutions  to  the  Euler  equations),  so  some  “av¬ 
erage”  solution  is  required.  The  next  step  creates  a  weak 
form  of  the  equations.  This  can  be  thought  of  as  a  pro¬ 
jection  onto  the  space  spanned  by  some  other  row  vector 
of  functions  N,  called  test  functions,  such  that  the  error  in 
the  discretisation  is  orthogonal  to  the  space  spanned  by  the 
test  functions.  In  the  weak  form,  the  equation  is  no  longer 
required  to  be  satisfied  pointwise,  but  instead  the  equation 
is  required  to  hold  for  each  test  function.  This  allows  the 
introduction  of  discontinuous  solutions,  as  well  as  providing 
some  means  for  obtaining  the  nodal  values  of  the  unknowns. 
To  create  this  weak  form,  premultiply  Eq.  (8)  by  Nr  and 
integrate  over  the  entire  domain.  When  this  is  done,  one 
obtains 


■-//" 


which  results  in  the  semi-discrete  equation 


— fliF,'  —  fl|Gi, 


(11) 


where  M  is  the  consistent  mass  matrix,  and  Rm  and  Ry 
are  residual  matrices.  The  matrices  M,  R.  and  Rt  involve 
the  integration  of  quantities  over  the  domain.  These  inte¬ 
grations  are  done  at  the  element  level  in  natural  (element- 
based)  coordinates,  and  assembled  to  give  the  global  matri¬ 
ces. 
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Physical  Coordinate! 


Natural  Coordinates 


Figure  1:  Geometry  of  Two-Dimensional  Element 


As  derived,  Eq.  (11)  gives  a  coupled  set  of  ODE’s  to  solve 
for  the  nodal  values  of  the  state  vector.  The  mass  matrix  M 
is  sparse  and  positive  definite  but  unstructured,  so  that  its 
inversion  requires  considerable  computational  effort.  If  one 
is  only  interested  in  the  steady  state,  M  can  be  replaced  by  a 
“lumped”  (diagonal)  matrix  Ml,  where  each  diagonal  entry 
is  the  sum  of  all  the  elements  in  the  corresponding  row  of 
Af.  This  allows  Eq.  11  to  be  solved  explicitly.  If  one  wishes 
to  solve  the  unsteady  Euler  equations,  it  is  better  to  invert 
the  mass  matrix  with  a  few  iterations  of  a  preconditioned 
conjugate-gradient  solver  [16]. 


3.2  Solid  Surface  Boundary  Condition 

At  walls,  the  portions  of  the  flux  vectors  representing  con¬ 
vection  normal  to  the  wall  are  set  to  sero  before  each  itera¬ 
tion,  and  flow  tangency  is  enforced  after  each  iteration.  The 
equation  for  the  fluxes  is  then 
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pUVm 
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where  um  and  vm  are  corrected  velocities  such  that  the  total 
convective  contribution  normal  to  the  wall  is  0.  These  veloc¬ 
ities  are  the  x  and  y  components  of  the  tangential  velocity, 
given  by 

um  =  u(l  -  ni)  -  vn.n,,  (13) 

vm  =  v(l  -  nj)  -  un.n,,  (14) 


where  n.  and  n,  are  the  components  of  the  unit  normal  at 
the  node.  This  is  easily  derived  from  the  vector  expression 

#«•»  =  (v  •  n)h,  (15) 

where  ft  is  the  unit  normal  to  the  wall.  This  expression  is 
obtained  by  finding  the  normal  component  of  the  velocity 
(v  •  h),  and  subtracting  it  from  the  velocity  vector. 

3.3  Far-Field  Boundary  Condition 

A  one-dimensional  characteristic  treatment  is  used  on  the 
open  or  far-field  boundary.  From  the  inward-directed  unit 
normal  vector  ft,  the  unit  tangent  vector  t  and  the  normal 
and  tangential  velocities  u„  and  ut  are  calculated.  The  1-D 
Riemann  invariants  (and  the  corresponding  wave  speeds) 
are 
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At  each  point  on  the  boundary,  the  invariants  are  calculated 
using  the  solution  state  vector  U  and  the  “free  stream” 
state  vector  U«.  Then,  a  decision  is  made  based  on  the 
sign  of  the  corresponding  wave  speed  (from  the  interior  u„ ) 
whether  to  use  the  invariant  based  on  the  current  state, 
or  the  invariant  based  on  the  free  stream.  If  the  relevant 
wave  speed  is  positive  (at  a  supersonic  inflow  boundary,  for 
example,  all  4  characteristic  speeds  are  positive),  then  the 
free  stream  value  is  used  for  that  characteristic. 


3 


The  invariants  are  transformed  back  into  a  set  of  primitive 
variables,  and  these  primitive  variables  are  used  to  calculate 
the  fluxes  at  the  boundary  nodes  for  use  in  the  residual 
calculation.  After  the  complete  iteration,  the  characteristics 
are  also  used  to  update  the  state  vectors  at  the  boundary. 
Although  this  is  not  necessary  for  convergence,  updating  the 
state  vectors  after  each  iteration  improves  the  robustness  of 
the  algorithm,  especially  for  biquadratic  elements. 

For  many  problems  with  subsonic  outflow,  particularly 
problems  involving  choked  flow,  evaluating  C\  based  on  the 
free  stream  is  not  an  appropriate  boundary  condition.  Spec¬ 
ifying  exit  pressure  is  a  more  physical  condition,  and  this 
conditions  is  enforced  by  setting  C\  such  that  when  the  char¬ 
acteristics  are  transformed  back  into  primitive  variables,  the 
desired  pressure  is  obtained. 

3.4  Smoothing 

To  capture  shocks  and  stabilise  the  scheme,  an  artificial 
viscosity  needs  to  be  added.  Currently,  the  smoothing  used 
consists  of  a  fourth-difference  term  and  a  pressure-switched 
second-difference  term.  Due  to  the  unstructured  nature  of 
the  grids,  a  Laplacian-type  of  second-difference  is  used,  in¬ 
stead  of  normal  and  tangential  or  (  and  rj  differences. 

The  heart  of  the  smoothing  method  is  the  calculation 
of  an  elemental  contribution  to  a  second  difference.  For 
the  three-dimensional  elements  and  the  2-D  sub-eleir  » 
described  below,  the  second  difference  of  a  quantity  q  is 
calculated  by  the  following  procedure: 

1.  In  each  element,  calculate  an  elemental  average  of  q  by 
summing  all  the  nodal  values  of  q  and  dividing  by  the 
number  of  nodes  in  the  element. 

2.  The  contribution  of  the  element  to  the  second  difference 
at  node  t  is  Wr(?avg  —  qi),  where  W  is  some  elemental 
weighting  factor  (often  1). 

3.  Sum  the  elemental  contributions  to  a  node  over  all  el¬ 
ements  containing  that  node. 

For  the  biquadratic  element,  the  main  element  is  divided 
into  4  sub-elements  (Fig.  2)  and  a  second  difference  is  calcu¬ 
lated  on  the  sub-elements  as  described  above.  To  calculate 
the  complete  smoothing  for  a  time  step,  one  first  calcu¬ 
lates  the  nodal  second  difference  of  pressure.  This  is  turned 
into  an  elemental  quantity  by  simple  averaging  over  all  the 
nodes  in  an  element.  The  elemental  second  difference  is 
then  normalised  by  an  elemental  pressure  average  to  form 
an  elemental  weight  for  the  following  step.  The  second- 
difference  smoothing  term  is  the  weighted  second  difference 
of  the  state  vectors,  multiplied  by  a  constant  between  0  and 
0.05.  The  fourth-difference  smoothing  term  is  the  second 
difference  of  the  second  difference  of  the  state  vectors  mul¬ 
tiplied  by  a  constant  between  0.001  and  0.05.  The  sum  of 


Figure  2:  Division  of  biquadratic  element  into  4  subelements 
(A,B,C  and  D) 

these  two  terms  is  added  directly  into  the  time  integration 
of  Eq.  17. 


3.5  Time  Integration 

To  integrate  equation  (11),  the  following  multi-step 
method  is  used: 


vi1}  =  ur  +  jXi-^-R^u^  +  vn 
v\7)  =  ur  +  j*(-^w(1,)  +  v") 

u\7)  =  V?  +  ~\(-j£-Ri(Vw)+V?)  07) 

l/‘4i  =  u?+  X  (-£LR,(U(3))+vr) 

(7"+l  =  U\A) 


where  Jf<(lf)  is  the  right-hand  side  of  Eq.  (11)  with  the 
fluxes  based  on  state  vector  U,  Vi  is  the  smoothing  term  de¬ 
scribed  above,  Mu  is  the  entry  in  the  lumped  mass  matrix 
for  node  *,  and  A  is  the  CFL  number.  Local  time  stepping 
is  used  to  accelerate  convergence,  with  the  time  step  given 
by 


A  U  = 


A  Xj 
|u|  +  o 


(18) 


where  A z<  is  some  nodal  characteristic  length,  and  u  is  the 
flow  velocity  at  the  node.  In  this  algorithm,  for  Ax,  we  use 
the  minimum  (over  all  elements  containing  the  node)  of  the 
average  lengths  of  opposite  sides  of  the  element. 
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3.8  Choice  of  Test  Functions 


Various  choices  for  N  are  possible,  each  giving  rise  to 
a  particular  spatial  discretisation.  The  details  of  the  se¬ 
lection,  and  the  advantages  and  disadvantages  of  each,  are 
discussed  more  fully  in  [15].  Briefly,  if  one  choses  N('*  = 
jV<*>  one  obtains  the  Galerldn  finite  element  approximation. 
This  is  the  “traditional"  finite  element  method,  and  is  the 
method  used  for  the  biquadratic  elements.  Some  of  the  2- 
D,  linear  examples  shown  in  this  paper  also  make  use  of  the 
Galerldn  method. 


One  can  also  choose  constant,  which  results  in  the 
cell-vertex  approximation.  This  is  the  approximation  used 
in  three  dimensions,  and  for  some  of  the  linear  2-D  examples 
shown.  It  was  not  used  for  the  biquadratic  elements,  as  it 
appears  to  be  unstable.  The  likely  cause  of  this  instability  is 
that  the  mid-element  node  in  the  biquadratic  element  is  de¬ 
coupled  from  the  other  8  nodes  in  the  element.  That  is,  the 
flux  at  the  center  node  does  not  enter  into  the  computation 
of  the  residuals  at  the  other  8  nodes. 


For  bilinear  elements,  the  Galerkin  and  cell- vertex  meth¬ 
ods  produce  almost  identical  answers  for  a  wide  range  of 
problems  [4,15],  and  so  no  further  distinction  between  these 
methods  will  be  made  in  this  paper  for  the  2-D,  bilinear 
solutions  presented. 
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Comparison  of  Bilinear  and  Biquadratic 
Elements 


Biquadratic  elements  have  the  potential  to  produce  more 
accurate  solutions  on  coarser  meshes.  The  use  of  coarser 
meshes  may  allow  the  CPU  cost  for  a  given  accuracy  to 
be  reduced.  A  single  biquadratic  Galerkin  element  requires 
about  a  factor  of  3.3  more  computation  than  a  single  bi¬ 
linear  Galerkin  element.  Since  one  can  view  a  biquadratic 
element  as  replacing  4  bilinear  elements  (for  a  given  num¬ 
ber  of  nodes),  there  is  a  slight  savings  attained  here.  In 
many  cases,  however,  a  single  biquadratic  element  can  pro¬ 
duce  results  comparable  to  16  bilinear  elements,  resulting  in 
substantial  computational  savings.  Unfortunately,  the  bi¬ 
quadratic  elements  have  some  slight  problems  near  shocks 
due  to  Gibb’s  phenomenon.  Three  test  problems  are  used 
to  compare  the  bilinear  and  biquadratic  formulations  of  the 
Galerkin  method  in  two  dimensions.  These  problems  are 
5°  converging  channel  flow  at  Mach  2,  the  flow  over  a  10% 
cosine-squared  bump  in  a  channel  at  Mach  0.5,  and  the 
flow  in  a  scramjet  inlet  at  Mach  3.  The  first  and  third  cases 
demonstrate  the  ability  of  the  biquadratic  elements  to  han¬ 
dle  flows  with  shocks.  The  second  case  demonstrates  the 
ability  to  capture  smooth  flows. 


4.1 


5°  Channel  Flow 


Figure  3:  5°  Converging  Channel,  12x3  grid 


Case 

RMS  %  Error  in  p 

CPU  (seconds) 

40x10  Bilinear 

1.9 

19 

80x20  Bilinear 

1.1 

126 

12x3  Biquadratic 

2.1 

9 

20x5  Biquadratic 

1.3 

26 

40x10  Biquadratic 

0.7 

150 

Table  1:  Comparison  of  Bilinear  and  Biquadratic  Galerkin 
Solutions  to  5°  Wedge  Problem 


The  5°  channel  flow  was  computed  using  biquadratic  el¬ 
ements  on  12x3  and  40x10  grids.  The  12x3  grid  is  shown 
in  Figure  3.  The  densities  along  the  surface  for  these  cases, 
along  with  the  exact  solution  (dotted  line),  are  shown  in 
Figs.  7  and  8.  Note  the  sharpness  of  the  shocks  and  the  ex¬ 
cellent  agreement  with  the  exact  solution.  For  comparison 
purposes,  the  solution  using  the  bilinear  Galerkin  method 
is  shown  for  a  40x10  grid  in  Fig.  9  and  for  an  80x20  grid 
in  Fig.  10.  Note  the  smearing  and  low-waven  mber  oscil¬ 
lations  present  near  the  shocks.  These  are  due  mainly  to 
dispersive  error,  and  are  not  the  result  of  artificial  viscosity 
[35]. 

Table  1  presents  a  quantitative  comparison  of  average  er¬ 
ror  and  computational  effort  for  these  cases.  This  table 
shows  some  interesting  facts.  For  about  half  the  effort  re¬ 
quired  in  40x10  bilinear  case,  one  can  obtain  the  same  ac¬ 
curacy  using  a  12x3  biquadratic  mesh.  For  about  the  same 
effort  as  an  80x20  bilinear  mesh,  a  40x10  biquadratic  mesh 
provides  better  accuracy. 


4.2  Subsonic,  Smooth  Flow 

One  expects  the  biquadratic  elements  to  be  very  good  for 
smooth  flows.  To  verify  this,  Mo s  =  0.5  flow  over  a  10% 
cosine- squared  bump  was  computed  on  a  24x8  biquadratic 
mesh  and  a  60x20  bilinear  mesh.  Figure  11  shows  contours 
of  density  for  the  biquadratic  elements.  The  contours  are 
quite  symmetric,  as  one  would  expect  from  a  flow  which  re¬ 
mains  completely  subsonic.  Most  of  the  non-smoothness 
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Figure  4:  Two-Dimensional  Scramjet  Inlet  Initial  Grid,  416 
Elements 


seen  in  the  contours  is  introduced  by  the  plot  package 
(which  divides  each  biquadratic  element  into  32  linear  trian¬ 
gles),  rather  than  actual  errors  in  the  flow.  For  comparison, 
Fig.  12  shows  these  contours  in  the  bilinear  case.  The  agree¬ 
ment  is  quite  good,  and  the  biquadratic  case  required  about 
3/4  the  CPU  time  of  the  bilinear  case. 


4.3  Mach  3  Flow  in  a  Scran\jet  Inlet 


The  flow  in  a  scramjet  inlet  at  Moo  —  3  was  computed 
using  adaptive  bilinear  elements  and  with  biquadratic  el¬ 
ements.  The  geometry  for  this  case  is  roughly  that  used 
in  Kumar’s  paper  [17],  and  the  initial  grid  for  the  bilinear 
case  (and  the  only  gTid  for  the  biquadratic  case)  is  shown 
in  Fig.  4.  The  computed  density  solution  for  the  bilinear 
case  is  shown  in  Fig.  13.  There  are  several  interesting  fea¬ 
tures  to  observe  here.  Note  that  the  shock  off  the  inlet 
entrance  reflects  six  times,  and  in  particular  note  that  near 
x  —  3.7,  y  =  0-3  ,  the  shock  bends  slightly  toward  the  cen¬ 
ter  line.  Now  look  at  the  strut  leading  edge  shocks,  and 
note  that  the  third  reflection  of  the  shocks  (starting  near 
x  =  2.8,  y  =  ±0.3,  and  barely  risible  as  “dents”  in  the  con¬ 
tour  lines)  passes  through  the  sixth  reflection  of  the  entrance 
shock  where  the  entrance  shock  experiences  the  bend.  An¬ 
other  cause  of  the  bending  is  the  interaction  with  the  slip 
line  off  the  trailing  edge.  This  slip  line  is  not  readily  ap¬ 
parent  in  Fig.  13,  but  a  plot  of  the  total  pressure  loss  (not 
shown)  at  the  exit  clearly  indicates  its  presence. 


5  Adaptive  Degeneration  of  Biquadratic 
Elements 


The  biquadratic  elements  formulated  here  perform  quite 
well,  even  for  flows  with  shocks.  In  some  cases,  the  mild 
oscillations  seen  near  shocks  may  be  highly  undesirable.  To 
overcome  this  problem,  an  approach  is  taken  in  which  the 
element  residuals  ( Ra  and  Rt,  see  Eq.  11)  near  a  shock 
are  computed  as  if  the  biquadratic  element  were  4  bilinear 
Galerkin  elements.  This  idea  has  some  similarities  to  the 
flux-corrected  transport  methods  [18],  but  it  is  much  sim¬ 
pler  to  implement.  There  are  two  issues  in  this  procedure: 
deciding  which  elements  should  be  degenerated  into  linear 
elements,  and  treating  the  interface  between  a  bilinear  re¬ 
gion  and  a  biquadratic  region. 

The  decision  to  degenerate  an  element  requires  the  for¬ 
mation  of  a  shock  detector.  For  this  study,  we  tried  a  very 
simple  shock  detector  consisting  of  a  scaled  second  differ¬ 
ence  of  pressure.  This  selection  was  motivated  by  the  choice 
of  a  switch  for  the  non-linear  second-difference  smoothing 
contribution  above.  To  calculate  the  switch,  the  absolute 
value  of  the  nodal  second  difference  of  pressure  is  averaged 
over  each  element,  and  this  average  is  divided  by  the  aver¬ 
age  pressure  in  the  element.  Where  this  quantity  exceeds 
a  certain  threshold,  the  element  is  degenerated.  This  indi¬ 
cator  gives  reasonable  results  for  the  very  limited  class  of 
prob  :ms  tried  so  far,  and  more  sophisticated  indicators  are 
under  study. 

The  degeneration  of  elements  into  linear  elements  intro¬ 
duces  interfaces  between  the  bilinear  regions  and  the  bi¬ 
quadratic  regions.  In  order  to  maintain  conservation,  it 
is  necessary  that  continuity  of  fluxes  and  state  vectors  be 
maintained  across  the  interface.  To  accomplish  this,  quan¬ 
tities  at  the  mid-side  node  on  the  interface  are  set  to  the 
average  of  the  two  endpoint  values.  This  makes  quantities 
on  both  sides  of  the  interface  linear,  guaranteeing  continu¬ 
ity. 

Alternatively,  one  can  re-derive  the  finite  element  formu¬ 
lation  for  the  case  in  which  some  of  the  interpolation  func¬ 
tions  are  piecewise  linear.  This  would  have  the  advantage 
of  not  reducing  the  resolution  along  the  interface,  an  im¬ 
portant  consideration  if  the  interfaces  are  not  aligned  with 
the  physical  discontinuity.  Time  constraints  prevented  the 
implementation  of  this  method  at  this  time. 


5.1  Examples 


Density  contours  for  the  solution  with  biquadratic  ele¬ 
ments  are  shown  in  Fig.  14.  The  resolution  is  not  as  good 
as  Fig.  13,  but  the  bilinear  case  used  14648  elements  and 
required  51  minutes  to  compute  on  a  3-processor  Alliant 
FX/8,  while  the  biquadratic  case  used  416  elements  and  re¬ 
quired  2  minutes  to  compute.  This  case  further  verifies  the 
usefulness  of  the  biquadratic  elements. 


Two  examples  are  presented  to  illustrate  the  use  of  adap¬ 
tive  degeneration,  The  first,  Moo  =  0.675  flow  over  a  10% 
circular  bump,  demonstrates  the  ability  of  the  adaptive  de¬ 
generation  to  remove  shock  oscillations.  The  second  exam¬ 
ple,  Moo  =  2  flow  in  a  scramjet  inlet,  demonstrates  the  abil¬ 
ity  of  the  degenerated  elements  to  improve  the  robustness 
of  the  biquadratic  algorithm. 
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The  Mac  =  0.675  bump  example  vu  computed  using  300 
biquadratic  elements.  Mach  number  contours  are  shown  in 
Fig.  15.  Note  the  formation  of  a  normal  shock  on  the  bump. 
The  significant  overshoots  and  undershoots  produced  by  the 
shock  are  visible  in  Fig.  16,  which  shows  Mach  number  on 
the  bump.  Note  also  the  odd-even  mode  propagating  up¬ 
stream  of  the  shock.  After  degenerating  8  of  the  biquadratic 
elements  into  the  equivalent  of  32  bilinear  elements,  the  os¬ 
cillations  and  overshoots  are  significantly  reduced  (Fig.  17). 

The  second  example,  Moo  =  2.0  scrarryet  inlet  flow,  is 
interesting  because  at  this  low  Mach  number  the  inlet  un- 
starts,  so  a  strong  bow  shock  stands  in  front  of  the  inlet. 
Since  the  flow  in  the  inlet  chokes,  the  location  of  the  bow 
shock  is  determined  by  the  inlet  exit  pressure.  For  this  ex¬ 
ample,  the  exit  pressure  was  set  (somewhat  arbitrarily)  to 
3.5  times  the  free-stream  pressure,  which  corresponds  to  the 
pressure  jump  through  a  Mach  1.77  normal  shock.  In  order 
to  capture  this  bow  shock,  the  grid  of  Fig.  4  was  extended 
to  include  the  region  ahead  of  the  inlet.  The  extension  took 
about  1  hour  of  the  first  author’s  time  to  generate,  further 
indicating  the  flexibility  of  the  unstructured  mesh  approach. 
Figure  18  shows  the  Mach  number  contours  for  the  solu¬ 
tion.  The  interfaces  between  the  biquadratic  and  bilinear 
regions  are  shown  as  dotted  lines.  The  bow  shock  is  cap¬ 
tured  clearly,  as  is  the  flow  in  the  center  passage.  Note  the 
presence  of  the  the  normal  shock  at  x  =  2.67,  and  observe 
the  slip  lines  off  the  strut  trailing  edges.  A  plot  of  pres¬ 
sure  and  density  at  the  exit  plane  (Fig.  19)  clearly  shows 
that  the  feature  off  the  trailing  edge  of  the  struts  is  a  slip 
line  and  not  a  shock.  A  slice  through  the  inlet  at  y  —  0 
(Fig.  20)  shows  the  bow  shock  and  normal  shock  after  the 
throat.  Note  that  M  =  1  at  the  throat  (z  =  2.16)  as  ex¬ 
pected.  This  case  cannot  be  run  with  biquadratic  elements 
alone,  because  the  initial  transient  in  the  solution  produces 
such  strong  overshoots  at  the  shocks  that  the  calculation  di¬ 
verges.  The  final,  converged  solution  also  requires  the  linear 
elements  around  the  bow  shock  to  prevent  divergence.  The 
final  grid  used  602  biquadratic  elements  and  208  bilinear 
elements. 

6  Three  Dimensional  Results 

The  extension  of  the  2-D,  bilinear  code  to  three  dimen¬ 
sions  is  straightforward,  involving  almost  no  changes  to  the 
basic  solver  structure.  The  extension  of  the  adaptation  data 
structures,  however,  is  far  more  complex.  The  computations 
involved  in  calculating  the  residual  for  the  Galerkin  method 
in  3-D  are  quite  involved,  and  since  the  2-D  results  showed 
little  difference  between  the  Galerkin  and  cell-vertex  meth¬ 
ods,  the  cell- vertex  method  was  chosen  for  residual  compu¬ 
tations.  Smoothing  and  boundary  conditions  are  done  as 
in  the  2-D  solver,  with  the  addition  of  a  second  tangential 
velocity  to  the  characteristics  of  Eq.  16. 

To  verify  the  three  dimensional  code,  the  test  problem  of 
Moo  =  2.5  flow  in  a  converging  channel  with  both  the  y  and 
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Figure  5:  Geometry  for  10°  Double  Wedge  Flow 

z  walls  sloped  at  10°  was  computed  on  a  30x30x30  grid. 
The  geometry  for  this  case  is  shown  in  Fig.  5.  The  double 
wedge  introduces  regions  in  which  the  flow  may  have  passed 
through  sero,  one  or  two  shocks.  In  addition,  there  are  other 
kinds  of  interactions  present  (discussed  fully  by  Kutler  [19]). 
Figure  21  presents  the  density  on  a  slice  perpendicular  to 
the  flow  direction.  The  shocks  can  be  seen  clearly,  and  in 
the  region  where  the  flow  has  passed  through  both  shocks 
(the  upper  right  corner),  the  results  of  the  interactions  are 
visible.  In  particular,  note  the  bending  of  the  shocks  as 
they  pass  through  each  other  and  interact.  This  behavior  is 
more  pronounced  as  the  shock  strength  increases.  Figure  22 
shows  the  Mach  number  on  the  plane  y  =  0.5.  Here,  the 
bending  and  interaction  is  clearly  visible. 

9.1  Adaptation  Criteria 

In  order  to  decide  which  cells  to  divide  or  undivide,  one 
needs  to  define  some  sort  of  adaptation  parameter.  There  is 
a  large  amount  of  literature  indicating  possible  choices  for 
this  adaptation  parameter  [20,21,22].  In  this  paper  an  indi¬ 
cator  designed  to  capture  shocks  based  on  a  scaled  second 
difference  of  density  is  used.  The  second-difference  based 
parameter  is  computed  as  follows: 

1.  Compute  a  nodal  second  difference  of  density. 

2.  Compute  a  nodal  first  difference  as  follows:  elements, 
the  following  is  used: 

First  Difference,  =  —  V'  |p,  —  Py f.  (19) 
n,  z — ' 
i 
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Figure  6:  Cutaway  View  of  Three-Dimensional  Interface 

where  j  ranges  over  all  nodes  in  all  elements  containing 
node  t,  and  n<  is  the  number  of  elements  containing 
node  ». 

3.  Compute  an  average  density  at  the  nodes. 

4.  Compute  a  nodal  adaptation  switch: 

_  [Second  Difference!  .  . 

[First  Difference! +<  x  Average’ 

where  e  is  a  small  parameter  (of  order  0.05)  added  to 
“smooth”  the  switch  in  smooth  portions  of  the  flow. 
Without  it,  small  oscillations  in  the  flow  tend  to  pro¬ 
duce  large  values  of  the  switch. 

5.  Distribute  the  nodal  switch  to  the  elements. 

6.  If  the  switch  is  greater  than  some  value  (usually  around 
0.15),  divide;  if  it  is  less  than  some  value  (usually 
around  0.05),  undivide. 

This  indicator  is  based  on  results  from  interpolation  the¬ 
ory  which  indicate  that  the  interpolation  enor  in  a  linear 
scheme  is  proportional  to  the  second  derivative  of  the  quan¬ 
tity  being  interpolated.  The  first  difference  is  used  as  a 
“fudge  factor”  to  reduce  the  sensitivity  of  the  switch  to 
greatly  differing  shock  strengths.  Without  the  first  differ¬ 
ence  scaling,  the  stronger  shocks  in  the  flow  tend  to  receive 
most  of  the  adaptation,  while  the  weaker  ones  receive  very 
little.  Other  advantages  of  this  indicator  are  explained  by 
Lohner  [21]. 

0.2  Interface  Treatment 

The  presence  of  embedding  results  in  interfaces  between 
the  coarse  and  fine  regions  of  the  grid.  In  three  dimensions 
there  are  two  types  of  interface  nodes:  mid-face  nodes  and 
mid-edge  nodes.  Figure  6  shows  a  cutaway  view  of  the  in¬ 
terface  between  a  fine  (to  the  right)  and  coarse  region  (to 
the  left)  in  three  dimensions.  In  our  treatment,  the  fluxes 


and  state  vectors  at  a  mid-edge  node  (node  6)  are  the  aver¬ 
ages  of  the  fluxes  and  state  vectors  at  the  ends  of  the  edge 
(nodes  1  and  4  in  the  figure).  The  fluxes  and  state  vectors 
at  the  mid-face  node  (node  5)  are  the  averages  of  the  fluxes 
and  state  vectors  at  the  corner  nodes  (nodes  1-4).  This 
treatment  is  easily  vectorised,  and  the  total  time  required 
to  treat  the  interfaces  is  negligible. 

0.2  Example 

A  three-dimensional  scramjet  test  case  with  Mao  —  5  was 
computed  to  demonstrate  the  3-D  adaptive  algorithm.  The 
geometry  of  the  three-dimensional  inlet  is  shown  in  Fig.  23, 
and  is  based  on  the  two-dimensional  geometry  above.  The 
leading  edge  sweep  angle  is  30s.  A  slice  at  z  =  0  is  identical 
to  the  two-dimensional  geometry,  and  the  slice  at  z  =  1  has 
the  portion  of  the  struts  and  compression  surfaces  forward  of 
the  throat  extended  to  give  a  30s  leading  edge  sweep.  This 
weakens  the  compressions  near  z  =  1,  and  forces  the  flow 
to  turn  down.  Although  the  actual  inlet  proposed  for  the 
NASP  project  has  a  cowl  plate  which  extends  back  from 
the  throat,  for  the  purposes  of  this  study  the  cowl  plate 
will  be  assumed  to  extend  to  the  inlet  mouth.  For  this 
case,  contours  of  density  at  three  z  locations  are  shown  in 
Fig.  25.  Compare  the  3-D  contours  with  the  2-D,  Mao  =  5 
density  contours  shown  in  Fig.  24,  and  note  the  differences 
introduced  by  the  third  dimension.  Unfortunately,  due  to 
problems  with  the  adaptation  indicator,  the  side  passages 
did  not  receive  enough  adaptation  in  the  3-D  case,  but  the 
central  passage  indicates  many  of  the  important  differences. 
At  z  =  0.5  and  z  =  0.87,  note  that  the  strut  leading  edge 
shocks  reflect  multiple  times  in  the  throat  area,  resulting  in 
a  significantly  different  flow  in  the  expansion  region.  Also 
note  the  differences  in  the  slip  line  positions  for  the  three  z 
slices.  In  the  z  =  0.13  slice,  the  slip  line  and  the  trailing  edge 
shock  are  nearly  on  top  of  one  another,  while  at  z  =  0.87 
they  are  quite  distinct.  Figure  26  shows  a  density  slice  at 
V  =  0.  Features  to  note  here  are  the  coalescences  of  the  mul¬ 
tiple  reflections  in  the  throat  area  near  z  —  1.2,  z  =  0.2, 
and  also  a  very  faint,  3-D  reflection  extending  from  about 
x  =  1,  z  =  0  to  about  z  =  2,  z  =  1,  due  to  the  turning 
produced  by  the  30s  strut  sweep.  This  case  was  computed 
adaptively  on  the  IBM  3090  at  Cornell,  had  188692  elements 
on  the  final  grid  and  took  over  12  hours  of  CPU  time.  Un¬ 
fortunately,  the  mesh  is  not  fine  enough  to  capture  many  of 
the  interesting  interactions.  If  one  more  level  of  embedding 
is  used,  over  1.3  million  elements  would  be  needed,  requir¬ 
ing  about  480  megabytes  of  memory  and  about  1  week  of 
single-processor  IBM  3090  CPU  time. 

7  Conclusions  and  Further  Work 

The  use  of  biquadratic  finite  elements  in  the  solution  of 
the  Euler  equations  can  result  in  significant  computational 
cost  reductions  for  many  problems.  The  elements  do  show 
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some  oscillation  near  flow  discontinuities,  but  these  effects 
are  relatively  minor  in  many  problems.  For  the  times  when 
oscillations  are  not  acceptable,  an  adaptive  degeneration  of 
a  biquadratic  element  into  4  bilinear  elements  proves  quite 
helpful. 

Further  work  is  required  on  the  adaptive  degeneration 
concept.  Two  areas  for  research  are  improving  the  degener¬ 
ation  criterion  and  fixing  the  interface  formulation  to  allow 
piecewise-linear  interfaces.  In  addition  the  flux-corrected 
transport  concept  could  be  implemented  as  an  alternative 
to  degeneration. 

The  extension  of  the  cell-vertex  method  to  three  dimen¬ 
sions  was  demonstrated,  and  the  utility  of  adaptation  was 
briefly  explored.  We  consider  the  results  reported  here  to 
be  preliminary,  as  there  is  much  work  yet  to  be  done  on  the 
choice  of  adaptation  indicator,  as  well  as  work  on  a  better 
artificial  viscosity  formulation.  Further  areas  of  research 
include  the  development  of  triquadratic  elements.  If  the 
savings  realized  in  2-D  apply  to  three  dimensions,  a  single 
triquadratic  element  may  well  do  the  work  of  64  trilinear 
elements. 

Finally,  work  is  in  progress  on  adapting  the  finite  element 
codes  to  massively  parallel  computers  such  as  the  Connec¬ 
tion  Machine.  Preliminary  results  indicate  that  speeds  on 
the  order  of  500  Megaflops  are  possible  for  the  unstructured 
codes,  and  it  is  possible  that  speeds  in  excess  of  1  Gigaflop 
are  achievable. 
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Figure  8:  Surface  Density,  Moo  =  2,  5°  Converging  Channel, 
Biquadratic  Elements,  40x10  Grid 


Figure  9:  Surface  Density,  Moo  -  2,5°  Converging  Channel, 
Bilinear  Elements,  40x10  Grid 


Figure  7:  Surface  Density,  Moo  =  2,  5°  Converging  Channel, 
Biquadratic  Elements,  12x3  Grid 
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Figure  11:  Density,  10%  cos2  Bump,  Af*.  =  0.5,  24x8  Grid,  Biquadratic  Elements 


Figure  12:  Density,  10%  cos’  Bump,  Mm  =  0.5,  60x20  Grid,  Bilinear  Elements 
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Top  View 


Side  View  (y  =  0)  Bottom  View 


Figure  23:  Geometry  of  Three-Dimensional  Inlet 
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Figure  25:  Density  in  *  —  y  Plane,  Moo  =  5,  3-D  Cue 


Figure  26:  Density  it  y  =  0,  Mm  —  5,  3-D  Case 
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Abstract 

A  new  parallel  programming  environment,  Id  World,  is 
under  development  by  MIT’s  Laboratory  of  Computer  Sci¬ 
ence.  This  approach  to  parallel  proceesing  is  distinct  in 
that  it  employs  a  dataflow  architecture  and  a  functional 
programming  language.  In  this  paper,  the  application  of  Id 
World  for  CFD  is  explored  by  solving  the  2D  Euler  equa¬ 
tions  using  Jameson’s  scheme  for  the  flow  past  t  circular 
arc  bump  in  a  wind  tunnel.  Both  subsonic  and  supersonic 
cases  were  run  for  a  17  x  5  grid.  The  parallel  code  was 
implemented  in  approximately  400  lines,  equivalent  to  a 
sequential  FORTRAN  implementation.  For  one  iteration, 
the  average  number  of  processors  used  in  parallel  was  over 
2000.  Overall  parallelism  in  the  code  as  well  as  parallelism 
of  specific  steps  of  the  method  within  the  code  are  exam¬ 
ined.  Ease  of  implementation,  determinancy,  and  memory 
allocation  and  deallocation  are  also  discussed. 

1  Introduction 

Current  state-of-the-art  single- processor  computers  are 
pushing  the  upper  bounds  in  computational  speed.  The 
hardware  capabilities  of  these  machines  are  limited  due  to 
single-processor  technology  approaching  its  theoretical  lim¬ 
its.  Therefore,  in  order  to  continue  increasing  computa¬ 
tion  speed,  multiple  processors  in  parallel  must  be  used. 
This  requires  the  development  of  parallel  architectures,  lan¬ 
guages  and  algorithms.  Due  to  the  current  dominance  of 
von  Neumann  architectures  and  imperative  languages,  such 
as  FORTRAN  and  C,  the  development  of  parallel  von  Neu¬ 
mann  architectures  and  parallel  imperative  languages  seems 
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like  a  natural  evolution  [1].  A  considerable  amount  of  effort 
is  bsing  directed  towards  this  goal. 


However,  another  very  distinct  approach  to  the  parallel 
programming  problem  is  being  researched  in  the  Compu¬ 
tation  Structures  Group  at  MIT’s  Laboratory  of  Computer 
Science.  They  have  developed  a  parallel  environment  called 
Id  World  baaed  on  a  dataflow  architecture,  a  “Multiple- 
Instruction/Multipls-Data*  architecture,  and  a  functional 
programming  language  called  Id.  Although  Id  is  designed 
for  dataflow  machines,  it  is  not  limited  to  this  class  of  ar¬ 
chitectures.  The  Computational  Fluid  Dynamic*  Labora¬ 
tory  in  MIT’a  Department  of  Aeronautics  and  Astronautics 
is  also  interested  in  studying  parallel  architectures  and  al¬ 
gorithms  for  CFD  applications.  In  this  paper  we  will  re¬ 
port  the  results  of  implementing  a  2D  Euler  solver  in  Id 
World.  The  algorithm,  given  to  students  in  the  advanced 
CFD  course,  intuitively  possesses  a  high  degree  of  paral¬ 
lelism.  Through  the  use  of  Id  World  the  amount  of  paral¬ 
lelism  inherent  in  the  algorithm  ia  explored. 


2  Algorithm  Description 


The  algorithm  implamanted  in  Id  solvtt  th*  steady  com¬ 
pressible  Euler  equations  for  ths  flow  past  a  circular  arc  in 
a  2D  channel  using  Jamsson’s  schams  [2|.  Tha  compreaaibl* 
Euler  aquations  In  Cartesian  coordinates  can  b#  writtsn  in 
v  sc  tor  form  as 
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where  the  state  vector  U  and  the  flux  vectors  F  and  G  are 
defined  by 
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Assuming  a  perfect  gas,  with  speciflc  heat  ratio  7,  pressure 
and  enthalpy  are  defined  as 


P  =  (7-1)P(£-  j  (u’  +  v1))  (3) 

H  =  E  +  £  (4) 

P 

where  p,u,v,E,p,  and  H  are,  respectively,  the  density, 
Cartesian  components  of  velocity,  total  energy  per  unit 
mass,  pressure  and  total  enthalpy  per  unit  mass.  The  Euler 
equations  represent  the  conservation  of  mass,  momentum 
and  energy  for  an  inviscid  perfect  gas. 


The  first  step  is  to  generate  a  computational  grid  repre¬ 
senting  the  physical  plane  using  an  algebraic  grid  generator. 
The  entire  flowfleld  is  then  initialised  to  the  freestrsam  val¬ 
ues  and  the  boundary  conditions  are  set.  An  outer  loop 
is  set  up  to  iterate  and  solve  the  entire  flowfleld.  Within 
this  loop,  the  local  time  step,  denoted  as  At,  and  artificial 
viscosity  ara  calculated  once  and  a  four-stage  Rungs-Kutta 
method  is  then  used  to  improve  the  solution.  The  Rungs- 
Kutta  method  involves  calculating  the  fluxes  over  the  sides 
of  the  cells  and  then  calculating  the  residuals.  The  residual 
of  a  cell  is  the  discrete  approximation  of 
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where  ft,  is  termed  the  residual.  The  Rungs-Kutta  method 
is  as  follows: 

u!,°‘  =  u?y 

u;;1  =  u<°>  -  At.,R(u<,°») 

»  u<®>  -  aaAt„R(u;;>) 

U<*»  -  -  a,At.,R(U<’>) 

u:/‘  =  u<;> 

with  a!  *  i  ,  a*  =  J  ,  as  *  J  ,  and  o«  =  1.  This  scheme 
is  of  second  order  accuracy  for  a  smooth  mesh.  It  should  be 
noted  here  that  there  are  dependencies  between  stages.  How 
these  dependencies  affect  parallelism  will  be  examined  later. 
The  inflow,  outflow,  and  solid  wail  boundary  conditions  are 
imposed  after  each  stage  of  the  Runge-Kutta  method.  The 
convergence  criterion  is  based  on  the  variation  in  pressure 
between  the  previous  iteration  and  the  current  iteration. 


Both  a  root-mean-square  test  and  the  maximum  pressure 
difference  is  calculated.  If  the  convergence  criteria  is  not 
met,  the  outer  loop  is  repeated  until  convergence.  Calcu¬ 
lations  were  done  for  subsonic  and  supersonic  flow  past  a 
circular  arc  bump  in  a  channel  using  a  IT  x  5  grid. 


3  Id  World 


In  the  engineering  world  FORTRAN  is  the  dominant  lan¬ 
guage.  The  usual  approach  to  parallel  programming  is  to 
annotate  FORTRAN  code  with  parallel  constructs  and  then 
to  compile  the  code  for  a  parallel  architecture.  For  some  ma¬ 
chines,  such  as  the  Alliant,  compilers  exist  which  automati¬ 
cally  parallelise  FORTRAN  code,  if  it  is  suitably  written  for 
the  particular  parallel  strategy.  Often,  the  determinacy  of 
the  program  is  left  to  the  programmer;  i.e.,  the  programmer 
must  insert  appropriate  synchronisations  to  ensure  that  for 
a  given  set  of  inputs,  the  output  will  always  be  the  same. 
Debugging  becomes  extremely  difficult  when  the  program 
may  produce  different  outputs  for  different  machine  config¬ 
urations  and/or  scheduling  policies.  In  addition,  this  behav¬ 
ior  may  not  be  immediately  obvious.  What  further  compli¬ 
cates  these  time-dependent  errors  is  that  they  may  not  even 
be  reproducible  in  a  debugger  [3).  A  fundamental  question 
arises:  "What  is  the  correct  output  for  a  given  set  of  inputs; 
i.e.,  is  the  parallel  program  executing  correctly?" 

Id  World  was  designed  to  overcome  the  problem  of  deter- 
minacy  and  easily  enable  one  to  study  the  parallel  behavior 
of  algorithms.  Id  World  consists  of  the  parallel  program¬ 
ming  language  Id,  an  Id  compiler,  and  GITA  (Graph  In¬ 
terpreter  for  the  Tagged- Token  Architecture),  an  emulation 
of  ths  MIT  Tagged- Token  Dataflow  Architecture  (TTDA). 
Monsoon  is  a  hardware  implementation  of  the  TTDA  with 
extensions. 

In  Id  World,  programs  are  written  in  the  parallel  pro¬ 
gramming  language  Id.  One  then  compiles  Id  programs  us¬ 
ing  ths  Id  compiler.  The  compiler  then  produces  dataflow 
graphs.  Next,  the  dataflow  graphs  are  executed  on  a  tagged- 
token  dataflow  architecture.  Either  GITA  can  be  used  to 
execute  ths  dataflow  graphs  or  Monsoon  can  be  used.  Once 
a  program  has  finished  execution,  the  parallelism  of  the  al¬ 
gorithm  can  be  examined  from  the  parallelism  profiles. 


3.1  Id 

Id  is  a  generai-purposa  high-level  parallel  language  with 
data  structures  suitable  for  scientific  applications  and  sym¬ 
bolic  computations  (AI  applications).  Id  is  a  functional  lan¬ 
guage.  In  a  functional  language,  a  data  structure  can  never 
be  overwritten  -  "no  side  offsets*.  Therefore  updating  is 
accomplished  through  ths  copying  of  data  structures.  This 
will  quickly  result  in  memory  problems,  an  issue  which  will 
be  discussed  in  more  detail  later.  For  all  purely  functional 
languages,  no  side  effects  guarantees  determinacy. 
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Another  feature  of  Id  is  that  the  parallelism  ie  implicit  in 
the  operational  eemantica  of  the  language.  The  programmer 
doee  not  explicitly  break  up  a  task  into  parallel  compoaenta, 
and  therefore  doee  not  worry  about  eynchroniiation.  The 
amount  of  paralleliem  in  a  program  ie  determined  by  the 
data  dependenciee  in  the  algorithm.  The  Id  compiler  deter¬ 
mine*  only  the  eeeential  data  dependenciee  in  the  algorithm, 
thue  relieving  the  programmer  of  the  tank  [3J. 

In  addition,  Id  ha*  dynamic  ttorage  allocation  which  en¬ 
able.  extra  itorage  to  be  dynamically  allocated  for  more 
parallelism.  Id  also  support*  fine-grain  parallelism.  The 
more  finely  the  Id  compiler  divide*  a  program  into  tasks, 
the  greater  the  opportunity  for  parallel  execution.  However, 
the  overhead  coet  of  synchronisation  must  be  minimised  in 
order  to  gain  in  performance  [4|.  Instead  of  addressing  this 
problem  at  the  software  level,  this  problem  is  addressed  and 
solved  at  th*  hardware  level.  The  Monsoon  processor  ac¬ 
complishes  low  synchonisation  coet. 

Id  is  a  layered  language,  being  purely  functional  at  its 
cor*.  Id  is  extended  with  I-structuree  as  a  parallel  data- 
structuring  facility.  An  I-structur*  is  a  data  structure,  such 
a*  an  array,  that  is  immemdiately  allocated  memory,  but 
doe*  not  need  to  be  written  immediately.  An  I-*tructur*  can 
never  be  overwritten  [5|.  In  comparison,  in  a  purely  func¬ 
tional  language  a  data  structure  is  both  allocated  memory 
and  written  immediately,  and  then  can  never  be  overwrit¬ 
ten. 

An  important  control  structure  for  numerical  computa¬ 
tions  is  an  array  comprehension.  Array  comprehensions  al¬ 
low  an  array  or  matrix  to  be  created  and  initialised.  Array 
comprehensions  initialise  arrays  that  may  either  be  purely 
functional  data  structures  or  I-structuree  [6|.  The  following 
array  a  must  be  an  I-structur*. 

a  -  (ld_ array  (1.4) 

I  [il  -  a.o 

l  Cj]  -  a(j-t]  ♦  a  it  j  <-  a  *e  4  > 

The  above  array  comprehension  creates  and  Initialises  an 
array  with  bounds  from  1  to  4.  This  array  is  bound  to  th* 
variable  a.  Th*  first  symbol  of  th*  array  comprehension, 
in  this  case  array,  specifies  the  type  of  the  array,  i.e.  10, 
2D,  etc.,  while  the  second  component  specifics  th*  bounds 
of  each  dimension.  This  fat  followed  by  a  set  of  mutually 
exclusive  clauses,  whet*  each  clause  is  separated  by  a  I, 
that  initialise  the  array  entries.  The  first  clause  fills  the 
first  entry  of  the  array  with  th*  value  3.0.  Th*  second 
clause  fills  entries  3  through  4  of  the  array  with  a(J*l]  * 
3.  The  symbol  1 1  Indicates  the  range  over  which  J  is  to  be 
looped. 

A  CFD  application  of  array  comprehensions  is  filling  all 
th*  interior  cells  of  a  grid,  and  then  specifying  the  boundary 
conditions  at  th*  inlet  and  outlet.  Therefore,  one  matrix  is 
created  containing  all  the  information  for  the  grid  (7j.  The 
following  is  a  typical  application  of  array  comprehensions. 


U  “  (2d.array  ((O.laax) . (0. Jnax)) 

i  (i.j]  ■  Interior  (i.j)  II  i  <-  1  to  iaax-i 

S  j  <•  0  to  jnax 

I  [O.J]  ■  inlet  (J)  II  J  <-  0  to  jnax 
I  [iaax.j]  •  outlet  (j)  II  j  <-  0  to  jnax) 

Th*  array  comprehension  creates  and  initialises  a  matrix 
with  bounds  0  to  iaax  for  the  first  dimension  and  bounds 
0  to  jnax  for  th*  second  dimension.  Th*  matrix  is  bound 
to  th*  variable  U.  Th*  first  clause  fills  th*  i.j  entries  of  th* 
matrix,  where  1  goes  from  1  to  inax-l  and  j  goes  from  0  to 
jnax,  with  th*  values  returned  from  th*  function  interior. 
Th*  values  returned  from  interior  may  be  any  data  struc¬ 
ture.  For  example,  if  a  vector  is  returned  from  interior 
then  each  entry  of  U  contains  a  vector.  Similarly,  the  inlet 
and  outlet  cells  of  th*  grid  are  filled  by  th*  values  returned 
from  the  functions  inlet  and  outlet,  respectively.  This  is 
illustrated  in  Figure  1. 


Figure  1:  Initialisation  of  U 


In  addition  to  array  comprehensions,  Id  has  a  large  li¬ 
brary  of  array  functions  neeeeeary  for  numerical  computa¬ 
tions.  The  array  library  is  implemented  to  achieve  maxi¬ 
mum  parallelism  of  the  functions. 

Another  Important  data  structure  in  Id  is  a  "tuple*.  A 
tuple  is  a  data  structure  that  represent*  a  collection  of  ob¬ 
jects.  An  n-tuple  contains  a  objects.  Th*  primary  us*  of 
tuples  in  this  application  was  for  array/matrix  bounds  and 
for  returning  multiple  values  from  a  function.  A  function 
can  return  only  on*  value.  A  tuple  can  be  used  to  pack¬ 
age  results  when  a  function  needs  to  return  multiple  value*. 
Th*  following  example  illustrates  the  us*  of  tuples. 

bounds  »  ((O.laax) , (0. jnax}) ; 

def  cale_pr*eeure  p.inf  laity  ■ 

<d*f  pressure  (I.j)  ■  p.ladlaltyCl. J]  ; 

la 

aakejeatrlx  bounds  pressure); 

In  this  example,  bounds  is  a  3-tuple  where  each  element  of 
the  tuple  is  a  3-tuple  that  represents  th*  bounds  of  th*  ma- 
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true.  The  key  word  def  indicate*  a  function  definition.  The 
function  c&lc.preteure  ie  creating  and  initialising  a  matrix 
with  bounds  bounds.  Cale_prsssur*  takes  one  argument, 
p.iailaicy,  which  in  this  case  ie  a  matrix.  The  body  of  the 
function  ie  expressed  as  a  ‘'block  expreesion*.  A  block  ex¬ 
pression  contains  a  set  of  statements  at  the  end  of  which  is 
the  key  word  in.  The  expreesion  following  In  represents  the 
value  being  returned  by  the  block.  In  this  example,  a  ma¬ 
trix  is  being  created  and  initialised  by  aeke.astrix  which 
it  a  function  in  the  array  library  that  takes  two  arguments, 
the  bounds  of  the  matrix  and  a  function.  The  entries  of  the 
matrix  are  being  filled  by  the  values  returned  from  the  func¬ 
tion  pressure.  Also  note  within  the  block  expression  that 
the  function  preeaure  takes  one  argument,  the  tuple  (1,1), 
which  will  be  bound  to  each  pair  of  indices  of  the  matrix 
being  created.  In  our  application,  we  extensively  use  array 
comprehensions  and  eake.natrlx,  thus  approximately  30% 
of  the  data  structures  we  use  are  tuples. 

3.3  Dataflow  Graphs 

A  dataflow  graph  is  the  parallel  machine  language  of 
dataflow  machines.  A  node  in  a  dataflow  graph  ie  an  in¬ 
struction  which  executes  only  when  the  operands  it  requires 
are  available.  The  arcs  in  the  graph  represent  the  essential 
data  dependencies  among  instructions  •  ‘data-driven*.  A 
dataflow  graph  imposes  only  a  partial  order  of  execution; 
all  execution  orders  which  obey  the  data  dependencies  in 
the  graph  yield  the  same  result.  This  is  the  source  of  paral¬ 
lelism  in  dataflow  graphs  -  a  valid  execution  is  to  procssa  in 
parallel  all  instructions  which  have  their  required  operands 
[8|.  For  the  following  block  expression  (Example  4),  the 
dataflow  graph  is  shown  in  Figure  3. 

{  x  ■  3*3  ; 
y  •  x«x  ♦  3*3  ; 
x  •  1  ♦  3  ; 

In 

(x*y)«(y-a)  }; 

In  contrast,  a  sequential  processor  imposes  a  total  order  on 
all  operands  within  a  task. 


3.3  Tagged- Token  Dataflow  Architecture 

The  TTDA  is  a  data-driven  machine  which  directly  exe¬ 
cutes  dataflow  graphs.  It  is  called  a  tagged- token  dataflow 
architecture  because  each  operand  to  an  inatruction  is 
tagged  with  an  identifier.  A  token  is  an  operand  and  its 
tag.  Therefore,  a  two  operand  instruction  has  two  tokens 
that  must  have  matching  tags.  Moreover,  tags  distinguish 
toksns  from  dlffsrent  'activations*  (3).  For  sxampls,  all  of 
ths  iterations  of  a  loop  potsntially  may  bs  operating  in  par¬ 
allel  and  therefore  repreeent  different  activations. 
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Figure  3:  Dataflow  Graph 


If  Example  4  were  in  a  loop  from  1  to  4  then  there  would 
4  activations  of  that  block  expression.  Now  examining  just 
the  first  line  of  that  block  expression,  x*3*3,  the  operands 
are  3  and  3  while  the  instruction  ie  «.  Each  operand  to  the 
instruction  *  has  a  tag  that  identifies  to  which  activation  the 
operand  belonp.  The  tokens  destined  for  each  instance  of 
•  are  diatinguiehed  by  differing  tags.  Therefore,  for  this  line 
of  code,  within  each  activation  there  are  two  tokens  destined 
for  the  inatruction  •  which  must  have  matching  tags. 

A  combination  of  hmrdwart  and  compiler  disciplines  guar¬ 
antee  that  only  two  tokens  will  ever  have  the  tame  tag  for  a 
two  operand  instruction.  This  earns  feature  guarantees  the 
determinancy  of  the  program. 


3.4  Hardware 

GITA  is  a  software  eystem  that  interpret*  dataflow 
graphs.  GITA  is  ran  on  a  Multiprocessor  Emulation  Facility 
(MET).  Ths  MEF  consists  of  33  TI  Explorer  Lisp  Machine* 
Linked  together  with  a  communcation  network  which  em¬ 
ulate  parallel  p  roc  easing.  The  MEF  is  used  primarily  for 
cod*  development  and  debugging.  GITA  performance  on 
one  TI  Explorer  is  only  1,000  instruction  per  second  which 
ie  prohibitively  slow  to  run  any  large  applications.  The 
advantage  of  networking  ie  that  performance  and  memory 
increaee  proportionally  with  the  number  of  machine*  used. 
However,  actual  execution  on  the  machinee  is  asynchronous. 

The  Monsoon  is  a  250  processor  dataflow  machins  with 
a  TI  Explorer  ae  its  front  end.  The  Monsoon  it  a  TTDA 
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implementation  with  extensions  for  efficient  matching  of  to¬ 
ken*.  Currently,  on*  prototype  processor  has  bean  built. 
The  performance  for  one  proceeeor  i»  i  MIPS,  while  an  siti- 
mate  for  256  proceeeor*  ie  1,000  MIPS  (aeeuming  50%  pro¬ 
ceeeor  utiliiation). 

3.5  Parallelism  Profile* 

The  parallelism  prodle  of  an  algorithm  ie  the  number  of 
proceeeore  versus  computation  timeetepe  (there  ie  no  con¬ 
nection  between  a  computation  timeetep  and  any  unit  of  real 
time).  For  the  previous  code  block,  the  parallelism  profile 
is  shown  in  Figure  3. 


Figure  3:  Parallelism  Profile 

A  program  can  be  run  in  one  of  two  mode*:  Idealised 
Mode  or  Finite  Proceeeor  Mode.  Idealised  Mode  assume* 
the  following:  every  instruction  executes  in  a  unit  timeetep, 
there  is  no  communication  latency  from  one  instruction  to 
its  successor,  and  every  instruction  executes  at  the  earliest 
possible  timeetep,  i.e.,  as  soon  as  its  input  data  is  ready,  and 
finally,  an  infinite  number  of  processors  is  assumed.  Finite 
Processor  Mode  assumes:  every  instruction  executes  in  a 
unit  times  tep,  the  result  of  an  instruction  takas  m  times  taps 
to  reach  its  successor  instruction  -  communication  latency, 
and  that  there  are  a  processors  (limited  from  1  to  1,000) 
[3|. 

4  Discussion  of  Rasults 

4.1  Ease  of  Implementation 

Although  Id  is  a  functional  language  and  does  not  allow 
overwriting  of  data  structures,  a  FORTRAN  programmer 
can  easily  learn  how  to  program  in  Id.  This  algorithm  was 
implemented  by  the  first  author,  who  had  to  learn  both  the 
Euler  algorithm  and  Id.  The  algorithm  was  implemented  in 
approximately  400  linss  of  Id  code.  This  tame  algorithm  im¬ 
plemented  la  a  sequential  FORTRAN  takes  approximately 
300-000  lines  of  code  based  on  the  style  of  the  programmer. 


The  key  to  programming  in  Id  it  understanding  how  to 
write  functions.  Unlike  moet  imperative  languages,  argu¬ 
ments  and  results  of  functions  can  be  arbitrarily  complex 
types:  arrays,  trees,  lists,  ...  even  functions  (5).  For  exam¬ 
ple,  an  array  may  contain  a  list  in  each  cell  and  a  function 
may  return  a  function.  This  is  important  to  remember  in 
the  design  and  implementation  of  the  algorithm. 

4.3  Design  and  Implementation 

The  aim  of  this  project  was  to  design  a  high  level  imple¬ 
mentation  of  the  algorithm.  The  bulk  of  the  computations 
are  in  the  four-stage  Runge-Kutta  loop,  specifically  in  the 
residuals  calculations.  The  algorithm  specifies  the  calcula¬ 
tions  primarily  as  vector  manipulations,  such  as  adding  and 
subtracting  vectors.  Therefore,  a  high  level  representation 
would  work  with  the  vectors  U,  F,  and  G  directly.  This  is 
easily  accomplished  since  data  structure*  can  be  arbitrarily 
complex.  Thus,  instead  of  repreeenting  U,  F,  and  G  in  a 
3D  array,  i.*.,  U[i,j,k)  where  (»,j)  is  the  cell  coordinate 
and  k  is  the  coordinate  in  the  vector,  represent  U,  F,  and 
G  as  vectors  in  a  2D  array,  where  each  cell  contains  the  3 
vectors.  The  required  vectors  operations  for  our  application 
were  adding  vectors,  subtracting  vector*  and  multiplication 
of  a  vector  by  a  scalar.  The  array  library  could  have  been 
used  to  perform  the  required  vector  operations;  however,  in 
this  case  it  was  easier  to  define  these  functions. 

4.5  Parallelism  Statistics 

Two  sets  of  parallelism  statistics  were  collected.  First,  the 
Arithmetic  Logical  Unit*  (ALU)  Operations  profile  show* 
the  total  number  of  instructions  performed  in  parallel  per 
time  step.  The  second  profile  collected  is  the  number  of 
floating  point  operations  performed  in  parallel  per  time 
step. 

Both  subsonic  and  supersonic  case*  were  run;  however,  all 
statistics  shown  are  for  supersonic  flow  on  a  17  x  5  grid.  The 
amount  of  parallelism  in  the  algorithm  can  be  assessed  by 
examining  only  a  smal  number  of  iterations  since  the  same 
calculations  are  performed  in  each  iteration  after  the  first. 
Therefor*,  all  parallelism  profiles  are  for  three  iterations 
(this  work  was  performed  using  GITA  on  the  MEF,  which 
as  stated  above  is  prohibitively  slow).  Finally,  all  profiles 
were  run  using  Idealised  Mode,  thus  showing  the  maximum 
amount  of  parallelism  possible. 

The  ALU  operations  and  floating  point  operations  pro¬ 
files  are  shown  in  Figure  4  and  Figure  5,  respectively.  The 
ALU  operations  pro  fils  shows  that  on  average  over  1.S00 
instructions  can  be  performed  in  parallel  per  iteration.  The 
floating  point  operations  profile  also  shows  a  high  degree  of 
parallelism.  The  dips  in  the  parallelism  profiles  show  the 
data  dependencies  between  iterations.  Within  on*  iteration 
there  are  no  dips  in  the  parallelism  profiles;  however,  a  four- 
stag*  Rnnge-Kutta  loop  is  executing.  As  shown  before,  the 
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Runge-Kutta  loop  ha*  data  dependencies  between  stage* 
and  intuitively  the  method  seem*  sequential.  However,  Id 
is  able  to  determine  the  maximum  amount  of  parallelism 
in  the  algorithm  since  parallelism  is  only  inhibited  by  data 
dependencies,  and  as  can  be  seen  the  Runge-Kutta  loop  is 
highly  parallel. 

4.4  Loop  Bounding  end  Parallelism  Profiles 

From  the  ALU  operations  profile,  the  first  iteration  use* 
a  peak  of  over  13,000  processors  while  the  later  iterations 
use  1,500  processors  on  average.  This  is  due  to  all  of  the 
loop  information  ‘unfolding*,  i.e.,  all  of  the  loops  begin  to 
execute  in  parallel  until  they  reach  a  data  dependency.  This 
is  clearly  undesirable  since  this  kind  of  parallelism  will  con¬ 
sume  all  available  resources.  Therefore,  we  want  to  ‘bound* 
the  loop  from  unfolding  too  many  iterations  at  ones.  In 
this  application,  only  two  iteration*  at  a  time  need  to  be 
unfolded,  the  current  and  previous  iterations.  The  previous 
iteration  is  needed  to  update  the  current  iteration.  Fig¬ 
ure  6  and  Figure  7  show  the  ALU  operations  and  floating 
point  operations  profiles  with  loop  bounding.  As  can  be 
seen  from  these  figures,  each  iteration  including  the  first  is 
now  identical.  A  peak  of  10,000  processor*  can  be  used, 
while  an  average  of  2,000  processors  is  used  per  iteration. 
Thus,  loop  bounding  has  more  evenly  distributed  the  work 
over  the  processors. 


4.6  I-Structure  Storage 

As  mentioned  previously,  the  updating  of  data  structures 
is  accomplished  through  copying.  Every  vector  operation, 
such  as  adding  two  vectors,  results  in  a  new  vector  being 
created.  This  is  needed  to  guarantee  determlnacy;  however, 
this  will  clearly  result  in  a  memory  problems  very  quickly. 
A  profile  of  I-Structure  versus  times  tape  was  collected  for 
the  case  with  loop  bounding  since  each  iteration  is  the  tame. 
Therefore,  the  profile  of  I-Structure  storage  versus  tlmesteps 
shows  how  many  bytes  of  memory  is  being  used  for  each  iter¬ 
ation.  As  can  be  seen  from  Figure  8,  each  iteration  requires 
slightly  over  50  kbytes  of  memory.  Since  every  data  struc¬ 
ture  that  is  created  can  never  be  overwritten,  these  data 
structures  remain  in  memory  until  execution  is  complete. 
The  supersonic  ease  requires  approximately  400  iterations 
for  convergence,  at  SO  kbytes  an  iteration,  that  would  re¬ 
quire  20  Mbytes  of  memory.  Clearly,  a  facility  to  ‘release* 
old  data  structures,  i.e.  data  structures  that  are  no  longer 
needed,  is  necessary. 

Currently,  the  only  means  of  releasing  or  reclaiming  stor¬ 
age  used  by  old  data  structures  is  by  explicit  deallocation. 
This  is  accomplished  with  the  following  syntax: 

(release  gar  bag*  .data,  structure 


Crtltase  can  be  placed  anywhere  within  a  function,  but  it 
will  not  be  invoked  until  just  before  the  function  returns 
[6].  Therefore,  within  the  function,  (release  will  not  cause 
errors  because  it  is  not  possible  for  a  computation  within 
the  function  to  access  the  garbage  data  structure  after  it  is 
released.  However,  a  problem  arises  when  an  (release  is 
used  within  a  function  but  external  to  that  function  there 
is  some  dependency  on  that  released  data  structure.  The 
following  example  clearly  demonstrates  the  problem: 

def  add.roctors  A  B  > 

{(release  A.B  ; 
in 

(array  (1.4) 

I  [J]  *  A C J  ]  ♦  B[j]  1 1  1  <-  l  to  4» 

M  -  R[3] ; 

T  ■  add.vectors  R  3; 

In  this  example,  once  the  two  vectors,  A  and  B,  are  added  in 
the  function  add.vectors,  those  vectors  are  released;  how¬ 
ever,  the  computation  to  initialise  M  and  T  are  running  in 
parallel.  If  the  call  to  add.vectors  finishes  before  R(3]  is 
accessed  then  R  will  have  been  released  and  an  error  will 
result  because  an  attempt  will  be  made  to  access  a  garbage 
data  structure;  however,  if  R(3]  is  accessed  before  the  call 
to  add_vectors  finishes  then  no  error  will  occur.  Thus, 
(release  can  introduce  non-determinism  into  a  program 

w- 

4.6  Sources  of  Excess  I-Structure* 

With  this  in  mind,  a  key  question  arises:  ‘Where  is  this 
‘garbage*  coming  from  and  how  easily  and  effectively  can 
it  be  explicitly  deallocated?” 

Garbage  is  created  from  three  primary  sources:  vector 
operations,  temporary  vectors  within  functions,  and  tu¬ 
ples.  Vector  operations  account  for  approximately  40%  of 
all  garbage,  temporary  vectors  within  functions  account  for 
25%  and  tuples  account  for  30%.  The  remaining  5%  is  cat¬ 
egorised  as  ‘other*. 

The  task  of  minimising  the  number  of  I-structures  being 
created  and  subsequent  deallocation  of  old  I-structures  is 
not  extremely  difficult,  but  can  be  time-consuming.  More 
importantly,  in  this  application  explicit  sequentialisation 
had  to  be  imposed  in  two  functions  in  order  to  release  old 
data  structures.  This  is  a  major  disadvantage  of  (release. 

After  implementing  the  necessary  reduction  and  deallo¬ 
cation  options  for  data  structures,  the  resulting  I-Structure 
storage  profile  is  shown  in  Figure  9.  The  amount  of  garbage 
is  only  1000  bytes,  2%  of  ths  original  amount.  Therefore, 
running  the  supersonic  case  to  convergence  would  require 
500  kbytes  of  memory,  while  the  subsonic  case  which  re¬ 
quires  approximately  3,000  iterations  for  convergence  would 
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•till  need  3  Mbyte*  of  memory.  The  ALU  operation*  and 
floating  point  operation*  profile*  nr*  ihown  in  Figure  10 
and  Figure  11,  re«p«ctiv*ly.  The**  profile*  show  a  signifi- 
cant  amount  of  sequentialisation  occur*  du«  to  th*  releas¬ 
ing  old  data  *tructur«*.  In  fact,  the  number  of  computation 
timestep*  i*  almoet  double  th*  number  required  for  the  ca«« 
with  loop  bounding  .  Therefore,  although  explicit  deallo¬ 
cation  i*  very  effective,  the  lot*  of  parallelism  i*  highly  un- 
deeirable.  In  addition,  for  CFD  application*  th*  amount  of 
garbage  muat  eeeentiaily  be  tero  for  practical  application*. 

4.7  Garbage  Collectors 

What  i*  needed  i*  a  facility  to  automatically  deallocate 
old  data  structure*.  LISP  system*  use  a  ‘garbage  collector* 
to  deallocate  old  data  structure*  once  a  certain  amount  of 
memory  ha*  been  filled.  Designing  a  garbage  collector  for 
Id  to  deallocate  basic  data  structure*  such  as  arrays,  lists, 
and  tuple*  is  not  a  difficult  task.  However,  th*  Computa¬ 
tion  Structure*  Group  i*  working  on  th*  more  general  prob¬ 
lem  of  how  to  determine  when  an  arbitrarily  complex  data 
structure  can  be  released.  Using  a  combination  of  explicit 
deallocation  and  a  garbage  collector  for  Id  would  efficiently 
deallocate  ail  old  data  structure*  and  impose  much  1***  se- 
quentialisation.  Thus,  a  garbage  collector  is  essential  for 
running  CFD  applications. 

5  Summary 

A  2D  Euler  solver  for  the  flow  over  a  circular  arc  bump 
was  implemented  in  Id  World.  Id  World  is  a  distinct  ap¬ 
proach  to  parall*!  processing  that  use*  a  functional  lan¬ 
guage,  Id,  and  a  Tagged-Token  Dataflow  Architecture.  Id  is 
a  powerful,  general-purpose  language  with  data  structures 
suitable  for  numerical  problems.  Algorithms  can  easily  be 
expressed  with  a  high-level  representation  in  Id.  Vector  ma¬ 
nipulations  are  greatly  facilitated  by  array  comprehensions 
and  th*  array  library. 

The  TTDA,  a  ‘data-driven*  machine,  is  inherently  paral¬ 
lel  since  it  directly  execute*  dataflow  graphs.  Th*  Monsoon 
is  a  hardware  implementation  of  th*  TTDA  with  extensions 
that  have  made  dataflow  architectures  practical.  With  the 
implementation  of  a  garbage  collector,  practical  CFD  prob¬ 
lems  can  be  solved. 

In  conclusion,  Id  World  is  an  excellent  tool  for  analysing 
th*  implicit  parallelism  in  algorithms  and  for  developing 
new  parallel  algorithms. 
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INTRODUCTION 

Considerable  progress  has  been  made  in  recent  years  in  the  development  of  nu¬ 
merical  methods  for  the  simulation  of  viscous  flows.  Various  approximations  to  the 
Navier-Stokes  equations  have  been  used  and  different  numerical  schemes  have  been 
developed  for  the  prediction  of  viscous  fields.  The  traditional  approach  solves  a 
specific  set  of  equations  on  a  single  grid  using  the  same  numerical  scheme  over  the 
entire  domain.  The  selection  of  the  equations,  the  scheme  and  the  grid  are  deter¬ 
mined  a  priori  by  the  user  before  starting  the  solution  procedure.  However,  quite 
often  some  or  all  of  the  above  factors  must  be  modified  by  the  user  in  order  to 
improve  the  results. 

The  robustness  of  current  numerical  schemes  as  well  as  present  computer  capa¬ 
bilities  has  allowed  a  dramatic  change  in  this  philosophy.  General  algorithms  have 
been  developed  which  are  flexible  enough  to  adaptively  adjust  the  equations,  the 
grid  and  even  the  scheme  during  the  solution  procedure  without  intervention  by  the 
user.  In  other  words,  the  algorithm  makes  decisions  with  respect  to  which  equations, 
grids  or  schemes  are  to  be  used  as  the  computation  proceeds. 

First  we  discuss  several  adaptation  methods  as  well  as  other  concepts  which  can 
be  applied  for  the  efficient  simulation  of  fluid  flows,  emphasizing  those  which  are 
viscous.  The  reasoning  that  leads  to  developing  adaptive  algorithms  is  examined, 
and  the  methods  of  grid,  equation,  and  temporal  adaptation  are  described.  Next, 
the  problem  of  detecting  the  flow  features  of  interest,  the  communication  between 
different  grids  and  the  application  of  a  turbulence  model  are  discussed.  Lastly,  the 
special  coding  requirements  for  an  adaptive  algorithm,  example  problems,  as  well 
as  an  evaluation  of  the  methods  in  terms  of  accuracy  and  efficiency  are  presented. 


NEED  FOR  ADAPTIVE  ALGORITHMS 


Most  of  the  current  schemes  (especially  those  used  for  complex  transonic  flow 
problems)  are  of  low  order  accuracy.  Higher  order  methods  often  require  a  large  com¬ 
putational  molecule  which  complicates  the  algorithm  and  poses  a  serious  problem 
when  applying  boundary  conditions.  Numerical  schemes  of  low  order  of  accuracy  are 
usually  used  when  shocks  are  present  in  order  to  capture  them  without  oscillations. 
The  use  of  a  higher  order  scheme  also  may  result  in  extra  computation  and  stor¬ 
age  of  additional  quantities  other  than  state  variables  (e.g.  stresses).  Imposing  high 
spatial  accuracy  often  leads  as  well  to  severe  restrictions  on  the  size  of  the  time-step. 
Lastly,  the  order  of  smoothing  operators  that  are  currently  used  degenerates  when 
non-smooth  grids  are  used.  Therefore,  in  practice  schemes  are  at  most  second  order 
accurate  in  space  and  time.  However,  the  accuracy  is  grid  dependent  and  schemes 
often  become  first  order  or  even  inconsistent  if  the  grid  is  stretched,  skewed,  etc. 
As  a  consequence  increased  resolution  is  needed  in  the  vicinity  of  features  to  ensure 
accurate  predictions.  A  boundary  layer,  for  example,  requires  at  least  15  points 
across  its  thickness  and  the  point  adjacent  to  the  wall  may  lie  at  a  distance  of  order 
1CT6  for  a  high  Reynolds  number  flow.  The  resolution  constraint,  orthogonality  and 
other  grid  quality  requirements  combine  to  make  grid  generation  very  difficult.  The 
problem  becomes  more  severe  in  3-D  where  the  generation  of  body-fitted,  nearly 
orthogonal  grids  is  very  difficult. 

The  initial  approach  to  alleviate  the  grid  problem  has  been  a  modification  of  the 
initial  grid  during  the  computation  by  redistribution  of  the  grid  points  such  that 
more  points  are  clustered  in  the  regions  of  interest  [18,17,7].  However,  the  number 
of  grid  points  is  fixed  and  clustering  in  one  region  results  in  less  resolution  in  other 
regions.  Another  drawback  of  the  method  is  that  it  frequently  results  in  skewed  and 
stretched  grids  which  deteriorate  the  accuracy  of  the  integration  scheme. 

Another  method,  which  will  be  described  in  some  detail  here,  is  the  use  of  a 
coarse  grid  initially  into  which  are  embedded  finer  grids  in  those  local  regions  with 
large  flow  gradients  (e.g.  boundary  layers,  shocks,  wakes,  etc).  Several  levels  of 
finer  grids  can  be  used,  and  can  be  limited  to  those  regions  of  the  domain  where 
important  features  exist.  A  special  feature  detection  algorithm  can  be  introduced 
in  order  to  place  the  grids  adaptively  during  the  course  of  the  solution  procedure. 

Within  a  flow  domain  the  dominant  physic-  differs  from  region  to  region  since 
the  flow  often  exhibits  a  variety  of  flow  phenomena  with  different  characteristics, 


such  as  shocks  and  shear  layers.  These  regions  can  be  described  by  different  sets 
of  equations.  For  example,  the  Navier-Stokes  equations  may  be  necessary  in  a 
boundary  layer  but  the  Euler  equations  actually  suffice  to  describe  the  physics  for 
an  inviscid  part  of  the  flow.  An  expensive  Navier-Stokes  calculation  need  not  be 
carried  out  over  the  entire  domain.  An  adaptive  algorithm  which  can  sense  the 
presence  of  specific  physical  phenomena,  and  which  uses  the  appropriate  governing 
equations  or  solver,  seems  to  be  advantageous  in  terms  of  CPU  time  savings. 

It  is  clear  that  an  algorithm  which  adapts  itself  to  the  developing  solution  by 
means  of  modifying  the  grid,  the  equations  and/or  perhaps  the  scheme  itself,  can 
offer  flexibility  and  economy. 

ADAPTIVE  LOCAL  GRID  REFINEMENT 

The  objective  of  adaptive  grid  refinement  is  to  adjust  the  grid  scale  in  regions 
where  extra  resolution  is  needed  (Fig.  1). 

The  resulting  embedded  grids  are  topologically  similar  to  the  initial  grid  and 
so  maintain  its  geometric  properties  (e.g.  stretching,  orthogonality),  but  are  not 
necessarily  aligned  to  the  initial  grid  as  the  embedded  meshes  ‘follow’  the  features. 
The  process  can  be  repeated  any  number  of  times  and  results  in  finer  and  finer 
local  embedded  grids  until  a  region  is  adequately  resolved.  This  approach  has  been 
employed  for  the  Euler  equations  [6,4,12,15],  and  for  Navier-Stokes  [10,11,9].  How¬ 
ever,  the  existence  within  the  same  domain  of  features  with  specific  but  different 
orientations  has  implications  for  the  way  in  which  an  initial  grid  is  embedded.  Divi- 


Figure  2:  Directional  embedding 


sion  of  a  cell  in  both  directions  creates  three  additional  cells  and  frequently  results 
in  unnecessary  resolution  in  regions  where  significant  flow  gradients  exist  primarily 
in  a  single  direction.  For  example,  in  a  boundary  layer  extra  resolution  usually 
is  needed  in  the  crossflow  direction,  but  not  streamwise.  The  advantages  offered 
by  directional  cell  division  (Fig.  2)  are  therefore  clear:  only  one  additional  cell  is 
created  instead  of  three,  and  this  results  in  significant  savings  in  computation  time 
and  storage  [lOj. 

Local  embedding  implies  two  important  consequences  for  any  basic  solver  that  is 
used.  First,  the  mesh  now  becomes  unstructured  and  the  usual  ij  indexing  can  no 
longer  be  used.  Instead,  the  solver  sweeps  through  cells  and  the  necessary  operations 
are  restricted  to  within  each  cell.  A  system  is  required  to  keep  track  of  all  the  needed 
information  for  each  cell  (pointer  system),  and  will  be  described  below.  Second, 
there  is  an  implied  communication  between  the  grids.  The  borders  between  grids  of 
different  refinement  levels  (interfaces)  must  receive  special  attention. 

EQUATION  ADAPTATION 

The  Navier-Stokes  equations  apply  for  most  flow  fields  of  engineering  interest. 
Frequently,  not  all  of  its  terms  are  needed  to  model  the  flow  physics.  The  viscous 
terms  in  fact  are  expensive  to  compute  but  often  are  negligible  over  large  parts  of 
the  domain.  A  suggested  approach  introduces  a  criterion  based  on  the  magnitude  of 
the  viscous  stresses  in  order  to  allow  the  algorithm  to  decide  where  the  full  Navier- 
Stokes  system  is  required  and  where  a  subset  system  (?.g.  the  Euler  equations) 
would  be  adequate.  The  border  between  two  such  regions  is  dynamically  defined 
by  the  algorithm  and  may  change  during  the  course  of  the  solution  procedure  [10]. 
Figure  3  illustrates  the  concept  for  a  shock/boundary  layer  interaction  problem. 

This  procedure  essentially  decouples  the  solver  from  the  grid.  Different  inte¬ 
grators  can  be  used  in  different  regions  of  the  domain  and  splitting  of  the  domain 
offers  flexibility  in  defining  an  integration  strategy.  Not  all  of  the  regions  need  be 


Figure  3:  Adaptive  splitting  of  the  domain  into  viscous,  inviscid  regions 

visited  during  each  sweep.  A  shear  layer  developes  much  more  slowly  during  the 
solution  procedure  due  to  the  very  small  time  steps  used  there,  while  the  outer 
inviscid  flow  advances  rapidly  to  a  steady  state.  Especially  in  cases  without  any 
strong  viscous/ inviscid  interactions,  a  strategy  can  be  adopted  which  integrates  the 
viscous  regions  a  number  of  times  for  each  inviscid  region  integration. 

Such  viscous/inviscid  decomposition  of  the  domain  does  not  create  additional 
interface  problems  since  the  same  solver  is  used  but  with  the  negligible  viscous 
terms  omitted  in  the  inviscid  region.  However,  interfaces  may  exist  for  cases  in 
which  different  schemes  are  employed  in  each  of  the  regions.  The  adaptive  domain 
decomposition  approach  may  be  extended  to  more  than  two  regions.  For  example, 
Euler  equations  may  be  applied  in  the  inviscid  region(s),  the  full  Navier-Stokes 
in  viscous  regions  where  significant  streamwise  diffusion  exists,  and  the  thin  layer 
Navier-Stokes  in  those  boundary  layer  regions  with  negligible  streamwise  diffusion. 

TEMPORAL  ADAPTATION 

For  time-accurate  explicit  methods,  the  size  of  the  time  step  is  restricted  by 
numerical  stability  considerations  such  as  the  CFL  condition.  There  are  situations, 
however,  in  which  the  time  step  constraint  is  governed  by  physics  rather  than  stabil¬ 
ity  considerations.  This  can  occur  when  there  are  significant  time  gradients  which 
must  be  resolved,  and  then  a  procedure  similar  to  that  for  spatial  embedding  may  be 
used.  The  algorithm  monitors  the  time-gradients  within  eau:h  cell  and,  when  above 
a  specified  threshold,  the  time  step  is  reduced.  Both  time  and  spatial  interfaces 
appear  and  generally  differ  from  one  another  [13]. 
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Figure  4:  Time  steps  in  adjacent  embedded  zones  differ  by  a  factor  of  two 

In  order  to  simplify  the  procedure,  the  time  steps  that  are  allocated  to  each 
cell  may  be  grouped  into  temporal  zones.  All  cells  within  a  temporal  zone  then 
are  integrated  with  the  same  time  step. and  the  time  steps  between  temporal  zones 
vary  by  factors  of  two  as  illustrated  in  Figure  4,  and  the  maximum  time-step  in 
the  domain  is  2"  times  larger  than  the  minimum  one,  where  n  is  the  number  of 
embedding  zones.  The  cells  of  temporal  level  1  are  integrated  in  time  twice  using 
a  time  step  St  before  the  cells  of  temporal  level  0  are  integrated  once  using  a  time 
step  of  2 St. 

This  procedure  allows  a  spatial  variation  of  the  time  steps  while  simultaneously 
maintaining  time  accuracy.  A  further  simplification  insists  that  temporal  interfaces 
coincide  with  the  spatial  interfaces.  This  can  be  accomplished  by  taking  into  ac¬ 
count  the  fact  that  spatial  adaptation  (cell  subdivision  to  resolve  spatial  gradients) 
often  results  in  temporal  adaptation  (reduction  of  the  time  step  to  resolve  temporal 
gradients)  as  well.  Large  temporal  gradients  may  exist  in  regions  where  features  are 
present.  Figure  5  illustrates  the  concept  with  the  temporal  zones  coinciding  with 
the  spatial  ones.  This  splitting  of  the  time-steps  according  to  the  embedded  zones 
saves  CPU  time  since  not  all  of  the  cells  are  marched  in  time  with  the  minimum 
time-step  that  is  found  for  the  entire  domain.  Instead,  only  those  cells  that  are  in 
the  embedding  zone  containing  the  cell  with  the  minimum  time-step  are  integrated 
with  that  minimum  time-step;  the  remaining  cells  which  lie  on  other  zones,  are 
marched  at  time-steps  that  we  multiples  of  the  globally  smallest  time-step. 

ADAPTIVE  BLOCKS 

Two  major  constraints  are  imposed  on  the  integration  scheme  by  these  proce¬ 
dures.  The  first  is  that  explicit  solvers  should  be  used  with  an  unstructured  mesh. 
The  presence  of  interfaces  in  the  domain  makes  the  formulation  of  an  implicit  scheme 
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Figure  5:  Spatially  varying  time  steps 

quite  difficult.  The  second  constraint  restricts  all  operations  to  using  information 
from  within  each  cell  to  upgrade  properties  for  that  cell.  Most  current  schemes, 
such  as  that  in  the  preceding  finite  volume  chapter,  satisfy  these  constraints  and  do 
not  seem  to  impose  a  serious  limitation  in  the  use  of  adaptation  procedures. 

However,  there  are  applications  for  which  an  implicit  scheme  may  be  more  ad¬ 
vantageous.  In  such  a  case,  the  adaptive  grid  philosophy  can  still  be  used  by  defining 
structured  groups  of  cells  (blocks).  These  can  be  formed  by  completely  embedding 
portions  of  the  domain  rather  than  embedding  individual  cells.  A  way  of  construct¬ 
ing  such  blocks  is  by  using  the  ‘embedding  patches’  concept  which  is  described  in  the 
feature  detection  section.  Within  each  block  an  implicit  solver  may  be  used.  The 
block  approach  can  apply  to  only  a  specific  region  of  the  domain  (e.g.  a  boundary 
layer,  which  can  be  considered  as  a  single  block),  while  the  totally  unstructured  grid 
can  be  employed  for  the  rest  of  the  field. 

ADAPTATION  STRATEGIES 

Consider  how  the  described  adaptation  methods  can  be  combined  into  a  single 
procedure  during  a  computation.  A  typical  sequence  of  steps  for  the  steady-state 
algorithm  is: 

1.  Solve  the  Navier-Stokes  equations  on  an  initial  coarse  grid. 

2.  Monitor  the  residual  error  until  it  falls  below  a  prespecified  value;  detect 
the  main  flow  features  and  the  borders  between  the  viscous  and  the  inviscid 
regions;  refine  the  grid  locally  if  the  detection  parameter  exceeds  the  threshold 
for  division,  or  remove  grids  if  it  is  less  than  the  threshold  for  removing  grids. 

3.  Continue  computing  on  the  updated  grid  using  the  Navier-Stokes  equations 
only  within  the  viscous  region  and  Euler  everywhere  else. 
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Figure  6:  Convergence  history  with  three  levels  of  adaptation 

4.  Repeat  steps  2,3  for  a  specified  number  of  adaptation  cycles. 

5.  March  solution  to  steady  state. 

Figure  6  shows  the  rms  residual  decay  to  a  steady-state  during  which  three  adap¬ 
tations  were  completed.  The  residual  peaks  after  embedding  due  to  interpolation 
error  when  allocating  values  of  the  state  variables  to  the  newly  created  nodes.  The 
jump  magnitude  becomes  less  as  the  grids  become  finer. 

Of  course,  the  above  adaptation  strategy  is  not  unique.  It  can  and  should  be 
modified  to  make  it  more  suitable  for  specific  areas  of  application.  For  example, 
if  interest  is  in  forces  acting  on  an  airfoil,  the  embedding  process  logically  cam  be 
related  to  the  grid  convergence  for  lift  and  drag.  In  that  case  the  solution  should  be 
maxched  to  a  provisional  steady-state  for  lift,  for  example,  after  each  adaptation, 
instead  of  tracking  the  residual  decay. 

Addition?  1  techniques  may  be  included  depending  upon  the  nature  of  the  special 
features.  One  such  technique  involves  the  use  of  preembedding.  The  initial  grid  is 
embedded  in  a  certain  local  region  before  the  computation  starts  in  order  to  meet 
a  specific  need.  For  example,  in  a  shock  tube  problem  an  accurate  prescribing  of 
the  initial  shock  is  accomplished  with  embedding.  Similarly,  when  a  vortex  needs 
to  be  prescribed  in  a  region  as  an  initial  condition,  the  initial  coarse  mesh  may 
be  inadequate  to  prescribe  it  accurately  and  therefore  the  voTtex  region  may  be 
pre-embedded.  After  initiating  the  computation  the  embedding  may  no  longer  be 
needed  or  be  required  elsewhere  due  to  the  feature’s  motion.  The  algorithm  then 
automatically  removes  the  pre-embedding. 


FEATURE  DETECTION 


The  detection  of  individual  features  and  of  different  regions  is  an  essential  part 
of  all  adaptation  algorithms.  The  adaptation  algorithm  must  be  able  to  sense  the 
existence  and  track  the  evolution  of  such  features. 

There  are  two  basic  approaches  in  detecting  features.  The  first  is  based  on  trun¬ 
cation  error  estimates  [2,5].  Since  a  primary  objective  of  adaptation  is  minimization 
of  this  error,  it  is  evident  that  direct  detection  of  those  regions  for  which  the  error 
is  large  is  a  most  suitable  guide.  Unfortunately,  the  truncation  error  is  not  known 
directly.  The  order  of  the  numerical  scheme  that  is  used  in  every  region  is  required, 
and  the  solution  on  two  different  grids  must  be  examined  along  with  the  scheme’s 
order  of  accuracy,  in  order  to  estimate  the  truncation  error.  The  order  of  accuracy 
of  the  scheme,  however,  is  not  known  exactly  since  it  is  usually  grid  dependent  and 
is  not  the  same  over  the  entire  domain.  In  a  shock  region,  for  example,  a  second 
order  scheme  usually  reduces  to  first  order.  Another  drawback  of  the  truncation 
error  method  is  the  high  cost  of  error  computations.  This  is  undesirable  especially 
for  unsteady  evaluations  in  which  the  detection  process  is  applied  frequently. 

The  basis  for  the  second  approach  is  detection  of  regions  where  significant  flow 
variations  exist  [6,10].  Of  course,  the  truncation  error  depends  upon  field  variations. 
It  is  negligible  in  the  regions  where  such  derivatives  are  small  and  maximum  in 
regions  of  larger  flow  variations.  Therefore,  it  is  reasonable  to  track  regions  of 
large  flow  gradients  and  to  resolve  them,  since  they  are  generally  the  sites  at  which 
flow  features  exist.  Detection  of  these  regions  essentially  leads  to  detection  of  the 
flow  features.  Viscous  flow  fields  exhibit  a  great  variety  of  features  with  different 
character,  length  scales,  and  orientation.  A  shock  ideally  has  zero  thickness  while 
a  boundary  layer  basically  has  a  finite  thickness.  Also,  shocks  can  have  different 
orientations,  shear  layers  very  often  follow  surface  boundaries,  while  a  wake,  on 
the  other  hand,  frequently  takes  the  direction  of  the  mean  flow.  Both  orientation 
as  well  as  the  location  of  the  features  may  change  appreciably  in  the  course  of  a 
computation.  In  a  steady-state  computation,  the  gradual  thickening  of  a  boundary 
layer  may  cause  the  shock  to  move  appreciably,  while  oscillations  of  the  free  stream 
may  change  the  orientation  of  a  wake  considerably. 

Feature  detection  is  grid  dependent.  A  very  coarse  grid  ‘sees’  even  very  mild  flow 
variations  while  a  fine  one  may  ‘ignore’  large  changes.  This  is  the  driving  force  of 
both  spatial  and  temporal  adaptation.  The  grid  is  adjusted  so  that  it  ‘sees’  a  rela¬ 
tively  smooth  solution  which  means  low  truncation  error.  The  choice  of  appropriate 


feature  detection  parameters  is  guided  by  the  physical  nature  of  the  flow.  Across  a 
boundary  layer  the  viscous  stresses  clearly  are  important  and  informative,  while  the 
approximately  constant  pressure  is  irrelevant.  On  the  other  hand,  pressure  gradients 
are  good  indicators  of  a  shock  region.  Entropy  and  Mach  number  variations  are  not 
good  indicators  of  weak  shocks.  Conversely,  density  differences  are  good  indicators 
of  both  shocks  and  boundary  layers,  but  clearly  inappropriate  for  incompressible 
flow.  Depending  upon  the  kind  of  problem  numerous  other  detection  parameters 
may  be  employed.  Temperature,  for  example,  may  be  quite  effective  in  detecting 
the  overheated  regions  in  the  first  stage  of  a  turbine  and  may  result  in  good  res¬ 
olution  of  regions  where  accurate  heat  transfer  prediction  is  important.  Another 
class  of  parameters  that  may  be  applied  are  those  which  use  information  from  the 
field  geometry.  For  example,  if  complete  resolution  of  the  trailing  edge  region  of 
an  airfoil  is  desired,  distance  from  the  trailing  edge  can  be  used  as  a  criterion  for 
adapting  that  area,  and  a  detection  parameter  A U  may  be  modified  to  AU.Rte . 
with  Rte  being  the  distance  fron  the  trailing  edge.  It  is  quite  clear  that  no  one 
universal  parameter  applies  for  all  applications,  and  that  more  than  one  parameter 
may  be  useful  to  achieve  accurate  feature  detection. 

The  form  of  the  detection  parameter(s)  is  another  choice  that  must  be  made  in 
carrying  out  the  feature  detection  procedure.  There  are  two  main  forms:  undivided 
and  divided  (gradients)  differences,  and  each  may  be  of  first  or  higher  order.  An 
undivided  difference  can  be  the  difference  in  the  detection  parameter  between  the 
opposite  corners  ( NE ,  SE  and  NW,  SW)  of  a  cell 

6U  =  (UpfE  +  Use)  —  [Unw  +  Usw)- 

A  divided  difference  is  formed  by  normalizing  the  above  difference  by  a  characteristic 
cell-dimension  e.g. 

5U  _  [Une  +  Use )  ~  ( Unw  +  Usw) 

6X  ( Xne  +  -Xse)  —  (Xnw  +  -XW) 

Again  the  choice  of  the  form  is  problem  dependent.  For  example,  first  order  gra¬ 
dients  (stresses)  provide  good  resolution  for  the  inner  part  of  a  boundary  layer  but 
leave  the  outer  edge  poorly  adapted.  Second  order  gradients  are  needed  to  resolve 
the  latter.  The  cells  within  a  viscous  layer  are  generally  smaller  than  those  in  the 
inviscid  regions.  As  a  consequence,  an  undivided  difference  of  a  detection  param¬ 
eter  can  be  of  the  same  magnitude  for  both  inviscid  and  viscous  cells.  Therefore, 
divided  differences  are  appropriate  for  proper  capture  of  shear  layers.  On  the  other 
hand,  use  of  parameter  gradients  when  detecting  shock  regions  leads  to  increasing 


gradients  after  each  adaptation,  and  most  probably  an  overadaptation  of  the  shock 
region  and  an  ignoring  of  the  rest  of  the  flow  domain.  Thus,  a  combination  or  both 
divided  differences  for  shear  layers  and  undivided  differences  for  shocks  is  appropri¬ 
ate.  For  example,  if  (J ^)e«u  >  (f %)th  or  ( 6U)eeU  >  ( SU)th  where  (f%)th  and  (6U)th 
refer  to  threshold  values,  then  the  cell  is  divided  in  the  X-direction.  Still  another 
consideration  in  the  choice  of  detection  parameters  is  the  amount  of  computational 
effort,  which  may  be  important  when  the  detection  process  is  repeated  frequently. 
In  figures  7,  8  are  shown  the  performance  of  various  criteria  in  the  detection  of  a 
boundary  layer  and  a  shock  separately  in  a  channel  flow  with  a  bump.  Fig.  7  il¬ 
lustrates  a  subsonic  flow  field  with  a  boundary  layer  being  the  main  feature,  while 
Fig.  8  shows  the  same  field  in  supersonic  flow.  In  the  latter  case  a  shock  is  formed 
at  the  leading  edge  and  is  reflected  at  the  upper  boundary.  It  evident  that  the 
use  of  density  leads  to  excessive  number  of  embedded  cells.  As  was  mentioned, 
Mach  number  is  a  poor  indicator  of  weak  shocks  compared  to  pressure  and  velocity 
differences  which  perform  quite  well. 

The  feature  detection  process  requires  specific  threshold  values  for  each  detec¬ 
tion  pi.  -ameter.  One  can  use  statistics  for  the  determination  by  calculating  the 
average  ($  =  s and  standard  deviation  (a*  =  1 )  of  the  values  of 

the  detection  parameter  $  over  all  N  cells  in  the  domain.  The  threshold  then  is 
defined  as  the  average  parameter  value  plus  a  fraction  (a)  of  the  standard  deviation 
(*)«»  =  5  +  ao^  (see  Fig.  9). 

The  value  of  the  coefficient  a  is  chosen  empirically,  and  a  typical  value  is  0.4. 
The  detection  process  is  not  critically  dependent  upon  the  choice  of  a. 

After  selecting  those  cells  to  be  adapted,  a  smoothing  procedure  can  be  applied 
to  the  grid.  The  procedure  aims  at  eliminating  spurious  cells  that  may  appear 
which  are  essentially  ‘noise  cells’,  or  cells  which  should  have  been  detected  but  were 
overlooked  by  the  procedure.  Figure  10  shows  characteristic  situations  in  which 
the  grid  has  both  ‘holes’  and  ‘islands’.  Simple  rules  can  be  constructed  so  as  to 
eliminate  these  situations. 

Another  way  of  smoothing  the  grid  is  the  use  of  ‘embedded  patches’  [11]  .  An 
embedded  patch  encloses  a  defined,  fixed  number  of  cells  of  the  initial  mesh.  It 
essentially  scans  the  domain  and  during  the  scan  the  included  cells  are  examined 
for  each  patch.  If  the  majority  of  its  cells  (e.g.  90%)  are  flagged  for  division,  then 
all  cells  currently  belonging  to  the  patch  Eire  flagged  for  division.  Conversely,  if 
very  few  cells  (e.g.  10%)  are  flagged,  none  within  the  patch  are  embedded.  When 
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Figure  7:  Detection  parameter  influence  on  adaptation  for  subsonic  boundary  layer 


(a)  velocity  differences  (  6tu,6mu,6tv,6mv  ) 


\wm 

mu 

■■■■ 

s 

ss 

mm 

■■ 

2™ 

m 

mm 

■■ 

i!S:::iHi':::::H::«S!^=-222 

— 

■■■■ 

■■■■ 

mm 

1= 

1=11"  ’11111=  llliii 

(b)  pressure  differences  (  Sip,  6mp  ) 


(c)  Mach  no  differences  (  6\M,  6mM  ) 


Figure  8:  Detection  parameter  influence  on  adaptation  for  a  shock 
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Figure  11:  Interruption  of  normal  lines  by  grid  interfaces 

repeated  throughout  the  domain,  the  patch  procedure  can  be  viewed  conceptually 
as  a  ‘searching  window’,  and  acts  somewhat  like  a  ‘noise  filter’  which  reduces  the 
number  of  ‘randomly’  embedded  cells. 

AN  ALGEBRAIC  TURBULENCE  MODEL  WITH  UNSTRUCTURED  GRID 

The  implementation  of  an  algebraic  turbulence  model  (e.g.  Baldwin-Lomax 
or  Cebeci-Smith)  in  the  presence  of  unstructured  grids  introduces  difficulties.  The 
models  implicitly  assume  a  structured  mesh,  and  the  implementation  is  usually  along 
lines  perpendicular  to  the  surface.  With  an  unstructured  mesh  continuous  normal 
mesh  lines  do  not  necessarily  exist  because  of  interrupting  interfaces  (Fig.  11),  and 
clearly  the  models  cannot  be  directly  applied  along  lines  such  as  AB  and  CD.  This 
may  be  overcome  by  applying  the  models  in  a  ‘cell-wise’  manner,  based  on  param¬ 
eters  known  at  the  center  of  each  cell.  In  this  way  we  avoid  using  information 
from  outside  the  cell,  an  approach  which  is  a  common  procedure  when  dealing  with 
embedded  meshes  generally  [10,9]. 

Consider  now  in  more  detail  how  the  Baldwin-Lomax  algebraic  turbulence  model 
[3j  might  be  applied  to  an  unstructured  mesh.  The  cells  in  the  turbulent  region  are 
arranged  in  streamwise  stations  (Fig.  12)  and  consist  of  those  from  the  initial  mesh 
plus  those  introduced  by  a  the  first  embedding  level.  The  first  embedding  level 
defines  the  borders  between  viscous  and  inviscid  regions  and  therefore  completely 
covers  the  turbulent  region.  During  the  higher  than  first  level  refinements  no  addi¬ 
tional  streamwise  stations  are  created. 

Two  kinds  of  quantities  are  necessary:  those  that  characterize  each  cell  and 
others  that  characterize  each  station.  Vorticity  and  the  distance  of  the  center  of  the 
ceil  to  the  wall  are  both  cell-based.  The  wall  quantities  y+,u+  and  the  Baldwin- 
Lomax  parameters  5^,*,  FmaX,  Uajj  characterize  the  entire  station.  The  vorticity  is 
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Figure  12:  Cells  grouped  by  stations 

calculated  using  Green’s  theorem  over  each  cell:  ui  =  —  fetii  (udx  +  vdy)  where 
Seeii  is  the  cell  area.  The  distance  of  each  cell  from  the  wall  is  calculated  and  stored 
whenever  the  grid  is  updated.  The  station  quantities  Ymai,  Fmaz  are  calculated 
by  scanning  through  all  of  the  cell*5  that  belong  to  each  station.  The  function 
F(y)  =  yM(l  —  exp^fr")  is  formed  fo*  k:  h  cell  and  its  maximum  value  which 
occurs  at  distance  YnaZ  from  the  wall  is  found.  The  wall  quantities  y+,u+  are 
evaluated  as  averages  from  the  station  cells  which  are  adjacent  to  the  wall.  The 
basic  assumption  is  that  there  is  no  significant  variation  in  over  the 

streamwise  length  of  a  station. 

The  treatment  for  wakes  is  similar  to  the  above  with  few  modifications.  The 
wake  stations  are  arranged  in  pairs  (Fig.  13)  formed  from  upper  and  lower  parts. 
The  minimum  velocity  cell  is  found  by  scanning  through  the  cells  of  both  stations 
of  each  pair.  Then  cells  ‘migrate’  from  one  station  to  its  counterpart  so  that  those 
which  axe  above  the  minimum  velocity  cell  are  assigned  to  the  upper  station,  and  the 
remaining  are  assigned  to  the  lower  station  of  the  pair.  The  normal  distances  of  the 
cells  from  the  center  of  the  wake  then  are  modified  by  simply  adding  or  subtracting 
the  normal  distance  over  which  the  wake  moved  from  the  stored  normal  distances 
for  each  cell. 

A  verification  of  the  above  approach  has  been  carried  out  for  the  case  of  a  NACA 
0012  airfoil  in  a  flow  of  Re  =  2.91  x  106,  M «,  =  0.5,  a  =  1.77°.  An  initial  C-mesh 
of  33x17  points  and  two  levels  of  embedding  were  used  for  the  computation  [11]. 
Fig.  14(a)  shows  the  pressure  coefficient  Cp  distribution  for  the  laminar  case,  while 
in  Fig.  14(b)  the  turbulent  case  is  compared  to  experiment.  The  computed  C|  0.197 
may  be  compared  with  the  experimental  value  of  0.195. 


Figure  13:  Allocation  of  cells  to  turbulent  wake  stations 


CODING  WITH  UNSTRUCTURED  MESHES 


A  classical  structured  grid  is  composed  of  quadrilateral  cells  which  axe  arranged 
so  that  each  grid  coordinate  has  an  (ij)  index.  State  variables  are  then  defined  at 
particular  points  in  a  two-dimensional  array,  the  essential  point  being  that  neigh¬ 
bouring  points  in  the  array  structure  are  also  neighbouring  points  in  the  physical 
computational  domain.  Such  an  arrangement  no  longer  exists  when  unstructured 
grids  are  used,  in  which  case  all  of  the  necessary  operations  are  generally  performed 
in  a  ‘cell-wise’  manner.  The  domain  consists  of  labeled  cells,  and  is  scanned  over 
cells  rather  than  (ij)-nodes.  All  cells  are  stored  in  a  one-dimensional  array  and 
use  is  made  of  indirect  addressing  to  obtain  the  values  of  the  variables  and  other 
information  that  may  be  needed  from  the  corners  of  each  cell. 

It  is  informative  to  consider  a  single  Do-loop  example  for  an  unstructured  mesh 
and  the  corresponding  loop  for  a  structured  mesh.  Say  that  the  loop  computes  the 
value  of  u-velocity  at  the  south  face  of  a  cell  (Fig.  15).  For  the  unstructured  mesh  the 
four  corners  are  stored  in  the  IC  array.  The  double  Do-loop  in  the  structured  mesh 
case  is  replaced  by  a  single  loop  of  twice  the  length  in  the  case  of  the  unstructured 
mesh.  This  is  a  desirable  feature  since  vectorization  is  generally  more  efficient  for 
lengthy  Do-loops.  On  the  other  hand,  more  memory  is  needed  for  the  pointers  used 
for  the  indirect  addressing.  There  is  also  an  extra  computational  cost  associated  with 
gather/scatter  operations  related  to  the  addressing.  However,  memory  overhead 
is  becoming  less  important  as  storage  capabilities  increase,  and  the  cost  of  the 
gather/scatter  operations  is  reducing  as  computer  architectures  are  optimized  for 
the  operations  which  are  used  by  the  unstructured  codes. 

COMMUNICATION  BETWEEN  CONTIGUOUS  GRIDS  (Interfaces) 

The  existence  of  embedded  regions  within  the  interior  of  the  computational  do¬ 
main  introduces  internal  boundaries  (interfaces),  which  may  be  either  spatial  or 
temporal  or  both.  For  example,  in  Fig.  5  there  are  both  spatial  and  temporal 
interfaces  which  coincide. 

An  interface  is  generally  characterized  by  an  abrupt  change  in  the  cell  and  time- 
step  size  which  poses  problems  of  accuracy.  The  grid  lines  may  either  continue  across 
the  interface  [8]  or  they  may  be  interrupted  by  the  interface  [10,14].  In  the  latter 
case,  cells  appear  which  contain  additional  nodes  at  the  faces.  Existing  schemes 
have  been  developed  for  cells  with  nodes  at  only  the  four  corners  and  they  need 


structured  mesh 


unstructured  mesh 


DO  10  J=  NJ  DO  10  CELL=  l.NCELL 

DO  10  1=  NI  SW=  IC  (1,CELL) 

10  US=  0.5*{  U(I,J)  +  U(I+1,J)  }  SE=  IC  (4, CELL) 

10  US=  0.5*{  U(SW)  +  U(SE)  } 


Figure  15:  Differences  in  coding  with  structured  and  unstructured  grids 

some  modification  in  order  to  take  into  account  the  extra  face  nodes. 

A  number  of  considerations  must  be  taken  into  account  when  designing  a  nu¬ 
merical  scheme  involving  interfaces.  Perhaps  the  foremost  concern  is  maintaining 
accuracy  at  am  interface  despite  the  stretching  error  that  is  related  to  the  sudden 
change  in  grid  size.  Second-order  numerical  schemes  degrade  to  first-order  and  in 
some  cases  become  inconsistent  where  the  grid  is  stretched.  Special  care  is  required 
for  face  (hanging)  nodes  which  are  not  surrounded  by  four  cells.  Another  important 
issue  is  the  maintenance  of  conservation,  by  which  is  meant  that  interface  fluxes 
between  cells  cancel  one  another  at  a  common  interface  (F i  +  Fj  +  Fs  =  0  in 
Fig.  16). 

Most  common  smoothing  operators  are  of  second  order  for  shock  capturing  and 
fourth  order  for  the  suppression  of  the  spurious  oscillations  that  may  appear.  Both 
operators  have  a  stencil  larger  than  a  single  cell,  which  means  that  they  gather 
information  from  both  sides  of  the  interfaces.  Unfortunately,  interface  stretching 
deteriorates  the  smoothing  operator  accuracy.  An  interface  treatment  scheme  should 
also  be  ‘free-stream  preserving’,  meaning  that  it  accepts  uniform  flow  as  a  solution. 
Other  important  considerations  are  the  coding  complexity,  and  the  ease  with  which 
the  interface  treatment  scheme  can  be  extended  to  three  dimensions.  In  practice,  the 
number  of  interfaces  in  a  domain  of  0(/VJ)  points  is  only  O(N).  As  a  consequence, 


Figure  16:  Conservative  treatment:  interface  fluxes  should  cancel 

the  added  computing  time  due  to  an  interface  is  not  an  issue. 

It  is  clear  that  the  above  considerations  impose  serious  limitations  on  the  con¬ 
struction  of  an  interface  scheme,  and  that  in  many  cases  these  can  be  contradicting 
requirements.  The  simultaneous  achievement  of  both  conservation  and  accuracy  is 
very  difficult,  and  even  impossible  to  achieve  in  most  of  the  cases.  However,  not 
all  of  the  above  factors  are  important  for  a  specific  interface.  For  regions  in  which 
gradients  are  small,  the  lack  of  accuracy  and/or  conservation  has  a  negligible  effect 
on  the  numerical  results.  Conservation  proves  to  be  an  important  property  in  cases 
of  moving  shocks  for  the  accurate  prediction  of  their  location  and  speed.  However,  it 
is  not  important  inside  a  shear  layer.  Conversely,  accuracy  may  be  more  of  an  issue 
in  a  boundary  layer  since  the  second  order  derivatives  (viscous  terms)  are  important 
and  they  are  more  ‘sensitive’  to  the  grid  stretching  error  than  first  order  derivatives, 
which  are  important  in  the  inviscid  regions. 

Let  us  now  describe  a  non-conservative  and  a  conservative  interface  treatment. 
The  non-conservative  treatment  is  illustrated  in  Fig.  17(a).  The  nodes  a,b  are 
integrated  using  the  parent  cell  B  instead  of  the  embedded  cells  C  and  D,  while 
ignoring  the  face  node  c.  The  values  at  node  c  are  obtained  by  interpolating  the 
values  from  nodes  a,b.  The  same  approach  can  be  employed  for  all  kinds  of  interfaces 
(e.g.  Fig.  17b).  This  treatment  is  also  easily  extended  to  three  dimensions.  Although 
it  is  non-conservative  it  does  avoid  the  stretching  error  associated  with  the  grid 
discontinuity  at  the  interface.  The  nonconservation  error  is  of  0( Ax)  and  it  is  local 
in  the  interface  region  only.  Therefore,  in  most  of  the  cases  it  does  not  deteriorate 
the  accuracy  of  the  solution  appreciably.  Only  in  cases  of  a  shock  that  is  located  very 
close  and  parallel  to  the  interface  or  when  a  shock  moves  through  an  interface,  an 
appreciable  error  is  expected.  A  more  detailed  investigation  of  the  various  interface 


Figure  17:  Nonconservative  interface  treatment 

treatments  and  especially  the  issue  of  nonconservative  vs  conservative  treatments, 
can  be  found  in  [9]. 

The  treatment  can  be  made  conservative  by  performing  a  special  integration  in 
cell  A  that  takes  into  account  the  face  node  c.  For  a  trapezoidal  integration  of  the 
flux  of  a  quantity  U  around  cell  A  the  interface  fluxes  are: 

FA  =  0.5(Ua  +  Ue){Xe-Xa)  +  0.5(Ue  +  Ub)(Xb  —  Xe) 

-0.5(LTa  +  Ue){Ye  -  Ya)  -  0.5(l/e  +  l7»)(n-ye). 

This  fl’ix  is  cancelled  by  the  corresponding  fluxes  from  cells  C  and  D 
Fc  =  0.5 (Ue  +  Ua)(Xa  -  Xe)  -  0.5 [Uc  +  Ua){Ya  -  Yc) 

Fd  =  0.5 (U„  +  Uc)(Xe  -  Xh)  -  0.5 {Uh  +  Ue)(Yc  -  Yb). 

In  the  normal  case  of  a  cell  with  four  nodes,  the  cell-area  is  divided  into  four  equal 
areas,  one  for  each  node  and  therefore  a  fourth  of  the  total  cell-change  is  allocated 
to  each  node.  However,  when  a  fifth  face  node  exists,  the  cell  area  is  allocated  to 
the  five  nodes  as  shown  in  Fig.  18(b)  with  the  resulting  distribution  coefficients 
shown  in  Fig.  18(a).  A  similar  ‘geometric’  approach  can  be  applied  to  obtain  the 
distribution  for  the  cases  of  cells  with  two,  three  and  four  face  nodes.  A  more 
rigorous  derivation  of  the  distrib*  tion  formulas,  which  uses  shape  functions  has  been 
employed  and  verifies  the  above  ‘geometric’  approach  [lj.  However,  this  treatment 
suffers  from  the  stretching  error  which  is  especially  severe  when  the  viscous  terms 
are  important.  For  example,  cell  ABCD  (Fig.  18(c))  is  used  to  evaluate  u„  at  node 
6  which  is  not  at  the  cell-center. 

For  time  accurate  computations,  where  time  and  spatial-interfaces  .oincide,  both 
cell-size  and  time-step  size  change  together  (Fig.  5).  The  non-conservative  treatment 


Figure  18:  Conservative  interface  treatment 


step  1  step  2  step  3 

Figure  19:  Conservative  interface  treatment  in  time 


remains  basically  the  same.  The  interface  nodes  a,b  and  c  are  considered  a  part  of  the 
coarser  level  and  are  integrated  once  with  a  time-step  of  2 St.  after  the  finer  cells  are 
integrated  twice  with  time-step  of  St.  However,  special  care  must  be  taken  in  order 
for  the  interface  fluxes  to  cancel  each  other  and  make  the  treatment  conservative 
(Fig.  19).  During  the  first  and  second  steps  the  fine  cells  are  integrated  at  time 
level  n  but  the  distributions  to  the  interface  nodes  as  shown  by  arrows  in  the  figure 
are  saved  for  later  updating.  At  the  third  step,  the  coarser  cells  are  integrated  and 
now  the  saved  distributions  from  the  fine  cells  are  assigned  together  with  those  from 
the  coarser  cells.  Since  the  interface  nodes  are  kept  to  the  old  time  level  n  during 
the  first  two  steps,  the  interface  fluxes  cancel  each  other  at  the  third  step  and  the 
treatment  becomes  conservative. 

EXAMPLES  OF  ADAPTIVE  CALCULATIONS 

Some  examples  of  results  will  best  illustrate  the  capabilities  of  adaptive  algo¬ 
rithms.  In  all  the  cases  the  non-conservative  interface  treatment  was  used. 

The  first  example  involves  supersonic  flow  through  an  8%  circular  arc  cascade, 


with  an  entrance  Xf^  =  1.4  and  Re  =  23  x  103  [10]. 

An  initial  mesh  of  25x25  was  used  followed  by  three  levels  of  embedding.  An 
oblique  shock  forms  at  the  leading  edge  of  the  arc  section  and  it  is  reflected  at  the 
upper  symmetry  boundary.  The  reflected  shock  then  interacts  with  the  boundary 
layer  at  the  trailing  edge  region.  In  Fig.  20,  the  grid  evolution  demonstrates  how 
the  embedded  grids  follow  the  detailed  physics  of  the  flow.  Figure  z0(e)  provides  an 
enlarged  detail  of  the  grid  near  the  surface  and  Figure  21  shows  the  flow  field  in  terms 
of  Mach  number  and  Cp  contour  plots.  The  boundary  layer  is  essentially  ‘lifted’ 
by  the  adverse  pressure  gradient  which  in  turn  is  induced  by  the  reflected  shock. 
Simultaneously,  because  of  the  effective  corner  which  is  formed  by  the  boundary 
layer,  compression  waves  are  formed  upstream  of  the  interaction  region  and  coalesce 
into  a  weaker  shock  which  impinges  on  the  upper  boundary.  Note  that  passage  of 
the  shock  through  interfaces  does  not  induce  any  stability  problems.  The  local 
embedding  procedure  appears  to  effectively  capture  the  detailed  physics  of  a  flow 
field  in  the  presence  of  rather  complicated  multiple-scale  phenomena.  Equation 
adaptation  was  also  used  to  limit  the  solution  of  the  Navier-Stokes  equations  to 
only  within  the  viscous  region. 

The  feature  detection  procedure  and  in  particular  the  utility  of  the  embedding 
patches  together  with  directional  embedding  are  evaluated  in  a  second  example. 
The  case  of  a  NACA  0012  airfoil  in  a  flow  of  Re  =  10s  and  M*,  =  0-8  and  an 
angle  of  attack  of  2°,  shows  the  effectiveness  of  the  feature  detecting  procedure 
[11].  Figure  22(a)  shows  Mach  number  contours  immediately  prior  to  embedding 
(note  the  enlarged  vertical  scale).  The  dominant  features  that  are  apparent  are  a 
normal  shock  on  the  suction  side  and  two  boundary  layers,  one  on  either  side  of  the 
airfoil.  Figure  22(b)  shows  the  resulting  adaptive  grid.  It  can  be  seen  that  there  is 
directional  embedding  above  and  below  the  airfoil,  while  the  leading  edge  region  is 
embedded  in  both  directions.  The  downstream  region  between  the  two  shear  layers 
remains  unembedded.  Comparison  of  the  grid  and  the  resulting  solution  makes 
quite  evident  that  the  features  are  faithfully  captured.  The  borders  between  the 
different  regions  are  fairly  free  from  ‘noise’  cells.  Figure  23  shows  an  enlarged  view 
of  the  flow  field  (Mach  contours)  for  the  separation  region  on  the  suction  side  of  the 
airfoil.  The  flow  details  appear  to  be  very  well  captured  by  the  method.  In  cases 
completed  so  far,  there  appears  to  be  no  restriction  on  the  number  of  adaptations 
carried  out  during  a  computation. 


ADAPTATION  EVALUATION 


The  previously  described  adaptation  methods  must  also  be  considered  in  terms 
of  accuracy  and  CPU  time  savings. 

Accuracy 

The  essential  advantage  of  spatial  embedding  is  that  resolution  is  introduced  only 
into  regions  where  large  flow  gradients  exist.  Since  regions  with  relatively  uniform 
flow  do  not  need  resolution,  an  embedded  grid  provides  an  accuracy  equivalent 
to  a  globally  fine  grid.  All  of  those  errors  which  are  introduced  due  to  interfaces 
(e.g.  non-conservation  and  stretching  errors)  scale  with  the  size  of  the  interface 
cells.  As  the  algorithm  refines  the  region  where  a  feature  exists,  interfaces  are 
introduced  in  gradually  finer  regions  and  all  errors  diminish.  The  outer  (first)  level 
interfaces  are  susceptible  to  the  largest  errors  but  are  generally  positioned  away 
from  the  largest  gradients  regions  and  do  not  introduce  appreciable  inaccuracies. 
Typically,  for  a  boundary  layer  the  first  level  interface  is  normally  placed  outside  of 
the  layer’s  edge,  while  finer  embedded  grids  appear  within  the  layer.  Apart  from  the 
interface  issue,  the  accuracy  of  the  embedded  grids  is  normally  very  good  since  the 
refinement  is  governed  by  the  flow  gradients:  larger  gradients  imply  finer  embedded 
grids.  Figure  24  compares  skin  friction  distributions  for  the  previously  described 
supersonic  circular  arc  cascade  but  for  two-level  embedding  and  a  globally  fine  grid  of 
97x97  [10].  The  agreement  is  excellent.  Figure  25  shows  a  separated  velocity  profile 
for  the  same  arc  cascade  in  subsonic  flow  (Afoo),  and  the  presence  of  three  interfaces 
does  not  introduce  any  kinks  to  the  profile.  The  use  of  equation  adaptation  does 
not  introduce  accuracy  questions  since  the  viscous  terms  are  omitted  only  where 
they  are  negligible.  Their  neglect  is  a  very  accurate  approximation. 

The  same  remarks  that  where  made  for  the  spatial  embedding  are  applicable  to 
temporal  embedding  as  well.  Errors  related  to  temporal  accuracy  scale  with  time- 
step  size  which  means  that  they  axe  proportional  -ell  size.  Figure  26  illustrates  the 
case  of  a  flow  with  an  oscillatory  Mach  nur  the  inlet  (M0 0  =  0.8  -I-  0.04smt), 

and  for  which  there  are  significant  temporal  gradients  over  the  entire  domain  [11]. 
One  level  of  embedding  has  been  used  as  shown  in  the  figure.  Both  curves  represent 
time  histories  of  the  U-velocity  component  at  a  specific  node  of  the  domain.  The 
agreement  is  very  good. 

Two  other  adaptation  properties  improve  the  numerical  accuracy.  The  first  is 
a  diminishing  of  the  smoothing  error  which  usually  scales  with  the  cell  size.  This 


is  most  important  within  the  viscous  regions  where  smoothing  can  contaminate  the 
solution  seriously.  The  approach  that  often  has  been  used  switches  off  smoothing 
gradually  as  the  wall  is  approached.  This  allows  considerable  smoothing  in  regions 
with  large  gradients  which  are  away  from  the  wall  (e.g.  shock/boundary  layer  in¬ 
teraction  region).  Embedding  inherently  provides  a  way  of  diminishing  this  error 
and  quite  often  switches  off  smoothing  gradually  as  more  viscous  regions  are  ap¬ 
proached  wherever  they  may  be.  The  second  is  the  ability  to  make  use  of  a  good 
initial  mesh  (minimal  stretching,  orthogonal).  The  embedded  grids  have  the  same 
good  properties  as  the  initial  mesh. 

A  question  does  arise  with  respect  to  the  reliability  of  the  feature  detection.  Ac¬ 
curacy  depends  to  a  great  extent  upon  the  correct  detection  of  the  flow  features.  The 
two  main  factors  of  uncertainty  that  exist  in  the  detection  procedure  are  the  choice 
of  appropriate  detection  parameters  and  the  determination  of  accurate  threshold 
values.  Experience  to  date  indicates  that  it  is  relatively  easy  to  decide  what  param¬ 
eters  should  be  used  for  certain  classes  of  problems  and  that  the  threshold  value 
choice  is  not  so  crucial.  A  worst  case  may  provide  excessive  embedding  for  some 
feature,  which  translates  into  less  CPU  savings;  however,  this  will  not  affect  the 
accuracy. 

CPU  time  savings 

The  original  motivation  for  the  introduction  and  use  of  adaptation  methods  is  a 
reduction  in  CPU  time  without  loss  of  accuracy.  The  savings  in  computation  time 
have  proven  to  be  quite  significant  in  most  cases. 

Several  main  factors  govern  the  amount  of  CPU  time  savings  which  an  embedded 
grid  offers  as  compared  to  a  structured  grid  of  the  same  resolution  level.  First  is  the 
number  and  extent  of  the  flow  features  with  respect  to  the  size  of  the  computational 
domain.  The  greater  the  extent,  the  lesser  the  savings;  the  larger  the  number  of 
embedding  levels,  the  higher  the  savings.  A  large  number  of  embedding  levels  allows 
the  use  of  a  relatively  coarse  initial  mesh,  and  any  comparison  with  a  globally  fine 
mesh  with  the  same  refinement  level  as  the  finest  embedded  grid  indicates  high  CPU 
savings.  However,  there  are  some  limitations  on  how  coarse  the  initial  mesh  can  be. 
In  a  high  Reynolds  number  case,  the  final  spacing  at  the  wall  may  be  of  the  order 
of  10~6  which  means  that  the  initial  mesh  will  have  a  wall  spacing  of  the  order  of 
10~6  which  is  still  quite  fine.  The  initial  mesh  should  have  some  points  inside  the 
boundary  layer  (approximately  four).  Otherwise,  the  Navier-Stokes  solver  will  ‘see’ 


a  sudden  jump  in  velocity  from  zero  to  free-stream  and  it  may  diverge.  Also,  a 
very  coarse  initial  mesh  may  yield  inaccurate  feature  detection  which  will  ‘guide’ 
adaptation  in  the  wrong  way.  This  is  corrected  at  the  next  adaptation,  but  some 
CPU  time  is  wasted. 

Let  us  now  proceed  to  an  examination  of  the  efficiency  of  each  one  of  the  adap¬ 
tation  methods.  Grid  embedding  is  the  most  effective  and  offers  the  most  savings. 
The  savings  are  proportional  to  the  ratio  R  of  the  cells  of  the  equivalent  globally 
fine  mesh  to  those  of  the  embedded  mesh.  Each  time  an  initial  mesh  with  N  cells 
is  embedded,  the  resulting  number  of  cells  is  N  +  ZfN  where  /  is  the  fraction  of 
initial  cells  which  are  divided.  Therefore,  after  k  adaptations,  the  ratio  R  is: 

R  = _ i! _ 

(1  +  3/i)(’l  +  3/2). ..(1  +  3/*) 

N  does  not  enter  the  above  expression  directly,  however,  it  affects  the  number  of 
adaptations  k ,  since  the  use  of  a  relatively  coarse  initial  mesh  leads  to  a  large 
number  of  embedding  levels  k  in  order  to  reach  a  certain  refinement  level.  The 
value  of  the  embedded  cell  fraction  /,  depends  upon  the  extent  of  the  features  to 
be  resolved  in  the  domain.  If  features  cover  the  most  of  the  field,  then  /  — ►  1  and 
thus,  J2  -♦  1,  which  implies  that  the  saving  is  not  important.  Conversely,  if  the 
features  are  confined  to  a  small  portion  of  the  domain,  /  -+  0  and  R  — *•  4*.  Also, 
the  larger  the  number  of  adaptations  k,  the  bigger  is  R.  Typical  values  for  /  may  be 
A  =  0.3,  A  =  0.15,  /3  =  0.1,  A  =  0.05,  which  yields  R  »  62.  Directional  embedding 
leads  to  significant  savings  by  virtue  of  introducing  only  one  extra  cell  compared  to 
the  three  additional  when  division  in  both  cell-directions  is  applied.  In  this  case, 
the  above  ratio  R  is: 

R  =  _ ^ _ 

(1  +  A)(i  +  A)-(i  +  A) 

It  should  be  emphasized  that  the  savings  in  CPU-time  due  to  embedding  is  not 
governed  only  by  the  above  ratio  R.  The  solution  in  an  embedded  mesh  advances  to 
steady-state  much  faster  than  in  the  corresponding  globally  fine  mesh  since  there  are 
coarse  regions  in  the  domain  which  allow  considerably  larger  time-steps.  Equation 
adaptation  is  less  effective  and  in  fact,  did  not  provide  significant  savings  for  viscous 
two  dimensional  cases  since  the  majority  of  the  cells  are  inside  the  boundary  layer. 
Nevertheless,  significant  savings  are  to  be  expected  when  appreciable  resolution  is 
needed  in  the  inviscid  region  and  in  three  dimensional  applications.  The  use  of 
time-steps  that  vary  spatially  according  to  the  embedding  zones  for  time  accurate 
calculations  leads  to  significant  savings  provided  that  the  sizes  of  the  cells  within 
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the  same  embedding  zone  are  similar.  If  the  cell-sizes  vary  significantly  within  the 
same  embedding  zone,  then  significant  savings  are  not  expected  since  all  cells  are 
integrated  with  the  time-step  of  the  smallest  cell  of  the  embedding  zone.  Table  1 
shows  the  CPU  time  savings  factor  for  each  of  the  methods  when  applied  to  the 
steady  state  field  generated  by  a  circular  arc  cascade  in  subsonic  flow  ( Moo  =  0.5). 
Embedding  in  both  directions  gives  a  speed-up  factor  of  13,  directional  embedding 
increases  the  factor  to  24,  and  equation  adaptation  to  28,  which  is  the  total  CPU 
time  savings  factor  due  to  the  combination  of  all  adaptation  methods.  For  the 
unsteady  case  of  an  oscillating  inlet  Mach  number  flow,  the  use  of  spatially  vary¬ 
ing  time-steps  provides  25%  CPU  time  savings  per  period  with  the  use  of  only  two 
embedding  levels,  compared  to  the  identical  embedded  mesh  using  the  globally  min¬ 
imum  time-step.  Finally,  updating  pointers  after  each  adaptation  does  not  increase 
the  computation  time  appreciably,  especially  if  adaptation  is  applied  only  a  few 
times  during  the  computation. 

CONCLUDING  REMARKS 

Viscous  flows  are  very  demanding  in  terms  of  computation  time  and  grid  qual¬ 
ity.  The  current  low  order  numerical  schemes  on  conventional  structured  meshes 
are  quite  expensive.  Algorithms  which  cam  modify  both  the  grid  and  the  equations 
used  in  the  course  of  a  computation  in  order  to  resolve  flow  physics,  seem  to  be 
promising  for  more  efficient  viscous  computations.  Methods  such  as  adaptive  lo - 


cal  grid  refinement,  equation  adaptation  and  temporal  adaptation  have  been  applied 
quite  successfully  yielding  significant  savings  in  CPU-time  without  sacrificing  accu¬ 
racy.  On  the  other  hand,  the  use  of  unstructured  (embedded)  grids  introduces  extra 
complications.  A  data  structure  system  is  needed  in  order  to  keep  the  required  in¬ 
formation,  and  interfaces  require  special  treatment.  Also,  the  detection  of  features 
is  very  important  for  accuracy.  These  tasks,  however,  can  be  successfully  tackled 
in  many  cases  and  do  not  seem  to  impose  serious  problems  in  the  application  of 
adaptive  algorithms  to  flows  of  engineering  interest. 
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Figure  21:  Solution  contours  for  8%  circular  arc  cascade  ( Moo  —  1-4) 
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Figure  22:  Effectiveness  of  embedding  patches  in  feature  detection 
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Figure  23:  Enlargement  of  separated  region 
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-  embedded  grid  (2-levels) 

•  globally  fine  grid  (97x97) 

Figure  24:  Comparison  of  skin  friction  distributions  for  adapted,  globally  fine  grids 
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Figure  25:  Separated  boundary  layer  profile  in  the  presence  of  interfaces 
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Figure  26:  Velocity  history  at  x  =  0.5,  y  =  0.6  for  oscillating  inlet  flow 
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INTRODUCTION 

The  Navier-Stokes  equations  have  a  range  of  validity  that  covers  most  of  the 
flows  of  engineering  interest.  Their  numerical  solution  is  expensive  and  quite  often 
a  subset  of  the  equation  system  is  employed.  However,  numerous  problems  require 
the  use  of  the  full  Navier-Stokes  equations  for  either  the  description  of  the  entire  flow 
field,  or  the  description  of  limited  regions  of  the  domain.  Rapid  progress  in  computer 
technology  has  made  the  solution  of  the  Navier-Stokes  equations  less  expensive  and, 
although  it  cannot  yet  be  ranked  along  with  the  standard  theoretical  tools  currently 
used  in  applied  aerodynamics,  it  is  certain  that  in  the  near  future  it  will  be  used  in 
routine  engineering  calculations. 

Methods  that  have  been  used  for  the  numerical  solution  of  the  Navier-Stokes 
equations  can  be  divided  into  four  main  categories:  (i)  finite  difference  ,  (ii)  finite 
volume  ,  (iii)  finite  element  ,  and  (iv)  spectral  methods.  Finite  difference  and  finite 
volume  methods  are  the  most  widely  used  at  present. 

We  will  consider  the  finite-volume  approach  in  general  without  going  into  specific 
descriptions  of  the  various  schemes  that  have  been  developed.  Surveys  of  numerical 
schemes  for  solving  the  Navier-Stokes  equations  may  be  found  in  [12,7,8].  The  em¬ 
phasis  here  will  be  on  the  basic  principles  and  on  those  problems  that  are  common 
to  most  of  the  schemes.  These  include  such  issues  as  accuracy,  viscous  grid  require¬ 
ments,  smoothing,  etc.  First,  the  Navier-Stokes  equations  are  presented  and  the 
finite-volume  discretization  is  described.  Next,  finite-volume  approaches  to  evaluat¬ 
ing  viscous  terms  are  examined,  and  issues  related  to  spatial  accuracy  and  require¬ 
ments  of  a  viscous  grid  are  addressed,  including  the  effects  of  artificial  dissipation 


on  accuracy  especially  within  the  shear  layer.  Lastly,  a  conservative,  finite-volume 
scheme  that  has  recently  been  developed  will  be  presented  and  investigated  with 
respect  to  the  issues  that  have  been  addressed. 


NAVIER-STOKES  EQUATIONS 

The  system  may  be  written  in  cartesian  two-dimensional  conservation  form  as  : 


dU_  dF  dG 
dt  dx  +  dy 


dR  dS_ 
dx  +  dy 
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are  state  and  convective  flux  vectors  in  the  x  and  y-  directions  respectively.  The 
viscous  flux  vectors  are 
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where  rtz ,  tvv ,  rzv  are  viscous  stresses,  and  qx,qv  are  heat  conduction  terms.  They 
are  given  by  the  following  relations: 

3u  .  dv 

T-  =  (A  +  2'‘)ai  +  AaJ 

,  .dv  du 
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In  the  above  relations  p  is  density,  u  and  v  are  velocity  components,  E  is  total 
internal  energy  per  unit  volume,  p  is  pressure  and  T  is  the  temperature.  For  a 
perfect  gas,  the  pressure  is  related  to  the  specific  total  internal  energy  E  by 

P=(l-  1)  [£-?(«’  +  v‘) 
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Also,  A  and  n  are  the  viscosity  coefficients  and  k  is  the  coefficient  of  thermal  con¬ 
ductivity.  The  Stokes  relation  is  used  to  eliminate  the  bulk  viscosity  coefficient 
A 

3A  -f-  2n  =  0 

and  n  is  a  known  function  of  temperature  T  (e.g.  Sutherland’s  law). 


FINITE  VOLUME  DISCRETIZATION 

The  above  two-dimensional  Navier-Stokes  equations  are  considered  in  the  inte¬ 
gral  form 

r  f  dU  r  r  ,dF  dG,jei  r  r  ,dR  dS,,„ 

J  Js  ~mds  +  S  Sj-li  -  - //s(aT  +  ^)iS  (1) 

The  surface  integrals  are  evaluated  by  using  Green’s  theorem,  which  is  the  heart 
of  the  finite-volume  approach.  Consider  an  area  S  and  its  contour  dS  which  is  simply 
connected  and  piecewise  smooth,  a  vector  V  =  (P,  Q)  that  is  twice  continuously 
differentiable,  and  the  unit  vector  n  which  is  normal  to  the  contour  dS. 

One  form  of  Green’s  theorem  in  cartesian  coordinates  is: 

HM-  ^)dxdy = is 9Mdl = L(Pdz + Qdy)-  (2) 

The  calculation  of  a  surface  integral  is  thus  reduced  to  the  evaluation  of  a  line 
integral.  Using  Eq.(2)  the  integral  form  of  the  Navier-Stokes  equations  (1)  becomes: 

i,  I  LUdS + UFdy  - Gdx)  =0*d«-Sd*)-  w 

The  flow  domain  S  consists  of  smaller  areas  (cells).  The  cells  are  defined  through 
the  coordinates  of  their  vertices  which  is  the  only  grid  information  that  is  needed. 
The  integrals  over  the  entire  domain  S  in  Eq.(3)  may  be  replied  by  the  sum  of 
integrals  over  each  cell: 

E  jrJ  (  0ds  +  Y.f  (Fiy  -  adz)  =  E  f  (My  -  s *0-  (4) 

There  is  a  physical  interpretation  for  each  of  the  above  terms.  The  first  term 
expresses  the  total  change  with  time  of  a  quantity  (e.g.  density)  for  all  cells,  while 
the  second  term  is  the  sum  of  the  net  fluxes  of  the  same  quantity  through  the  faces 
of  the  cells.  The  term  on  the  right  hand  side  represents  viscous  fluxes  through  the 
faces. 
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Figure  1:  Equivalence  of  finite  difference  and  volume  modeling 


If  the  flow  field  is  uniform,  then  equation  (2)  yields  [6,15]: 

=  0  (5) 

which  means  that  the  cells  should  be  closed  surfaces.  A  second  geometric  constraint 
is  that  the  sum  of  the  cell-areas  should  equal  the  area  of  the  entire  domain.  Both 
geometric  constraints  together  basically  ‘define’  the  kind  of  control  areas  that  may 
be  used,  which  are  usually  quadrilaterals  and  triangles. 

Consider  the  relation  between  the  finite-volume  and  finite-difference  approaches 
taking  as  an  example  an  evaluation  of  the  first  order  derivative  at  the  point  P 
in  the  cartesian  cell  of  (Fig.  1).  Using  the  finite-volume  approach, 


which  is  identical  to  central  finite-differencing. 


U.+ 1  ~  Ui-i 
Ax 


Green’s  formula  provides  a  derivative  value  which  is  averaged  over  the  cell-area 
but  do»S3  not  necessarily  apply  at  the  center  of  the  cell.  In  fact,  the  substitution 
(| j)p  =  j  /  udy  holds  only  if  P  is  located  at  the  centroid  of  the  cell.  This  can  be 
seen  from  a  Taylor  series  expansion  of  §^(x,y)  around  the  point  (xo,yo)  so  that 


ljs^ds=IJs^xo'y^ds+I!s^~xo^^°'y^ds+IIs^~yo^^o'yo)ds+-- 


d2u 


Therefore, 


du 

dx 


holds  exactly  only  if 

/ /(x-z0r(y-yo)ndS  =  0 

for  m, n  =  0,1,...  which  means  that  (x0,yo)  must  lie  at  the  centroid  of  the  cell.  For 
simple  geometric  shapes  such  as  parallelograms,  the  centroid  in  fact  does  coincide 
with  the  center;  for  general  shapes  they  differ.  The  error  that  results  when  it  is 
assumed  the  two  coincide  is  of  higher  order  in  most  cases. 


VISCOUS  TERMS  EVALUATION 


We  now  focus  on  various  ways  in  which  the  viscous  terms  of  the  Navier-Stokes 
equations  may  be  calculated.  Second  order  derivatives  are  involved  and  usually 
require  two  separate  steps.  A  first  step  computes  stresses  and  the  second  evaluates 
the  viscous  terms.  Normally  viscous  terms  would  require  a  larger  computational 
molecule  than  inviscid  terms;  however  most  schemes  do  avoid  that  large  stencil. 

The  primary  considerations  of  such  schemes  with  regard  to  the  discretization  of 
viscous  terms  are: 

•  A  small  computational  molecule  and  compactness 

•  accuracy 

•  cost  of  computations 

•  numerical  dissipation 

•  conservation 

•  free-stream  preservation 

No  one  method  copes  well  with  all  of  the  above  somewhat  contradictory  considera¬ 
tions. 

We  now  consider  various  approaches  for  evaluating  viscous  terms.  Consider  the 
grid  shown  in  Fig.  2  with  state  variables  stored  at  the  cell  centers. 

In  order  to  evaluate  a  typical  viscous  term  (e.g.uXI)  at  a  cell  center  P,  an  inte¬ 
gration  is  performed  over  the  cell: 

(u«)p  =  4  f  [  uzzdS  =  4  uxdy  = 

S  J  J  S  Jabcd 

^{(uz)ca(yc  —  Vb)  +  («*)r>c(l/r>  -  yc)  +  (uz)Ao(yA  ~  yo)  +  (uz)flA(ya  —  yx)}  (6) 

where  C B,  DC,  AD,  and  BA  are  the  points  in  the  middle  of  the  faces  of  the  primary 
cell  ABCD  of  Fig.  2.  The  unknown  first  order  derivatives  in  Eq.(6)  then  need  be 
evaluated.  One  approach  then  performs  a  second  integration  over  the  secondary 
volume  A'B'C'D1  [11,13]. 

(“*)«  =  7'  /  /,  u‘is  =  h  udy  = 

-^{ua( vb1  -  y A*)  +  UE(y c  -  ys*)  +  uc(y d>  -  yc)  +  uP{yA>  -  ym)}  (7) 

uc  =  +  **E  +  U NE  +  titf),  u b  =  “(us  +  ^SE  +  ^E  +  «p)> 
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Figure  2:  Viscous  terms  discretization 

and  5,  S'  are  the  areas  of  the  primary  and  secondary  cells  respectively.  Similar 
averages  are  introduced  for  each  of  the  necessary  quantities  at  the  various  points. 
The  computational  molecule  involves  9  points  and  is  also  the  same  molecule  that 
is  employed  for  the  discretization  of  inviscid  terms.  If  the  solution  is  uniform,  then 
uc-  =  ub  =  up  =  ue  and  Eq.(7)  yields  (uz)cb  =  0,  and  consequently  (uIZ)p  =  0; 
which  means  that  the  discretization  accepts  uniform  flow  as  a  solution  (free-stream 
preserving). 

We  now  examine  whether  or  not  an  odd-even  mode  is  accepted  by  the  above 
discretization.  For  example,  suppose  a  solution  at  the  grid  points  N,  P ,  E,  NE  of  a 
cartesian  grid  has  the  form  of  a  sawtooth  mode.  Then  uc  =  up  =  0  and  the  line 
integration  around  -the  secondary  cell  A'B'C'D1  yields  (uz)cb  =  0.  Consequently, 
(«zz)/»  vanishes,  implying  that  such  a  solution  is  accepted  by  the  scheme. 

Another  method  makes  use  of  averaging  in  order  to  obtain  the  stresses  at  the  cell- 
faces:  (u*)cb  =  |{(«*)p  +  (u*)£}  (Fig.  2),  and  then  the  cells  ABCD  and  BFGC  can 
be  employed  to  calculate  (u,)p  and  (u*)g.  This  however,  results  in  a  large  stencil. 
Numerous  other  variations  can  be  used,  differing  mainly  in  the  way  the  averaging  is 
performed,  in  order  to  obtain  the  required  information  at  various  points,  and  in  the 


way  the  line  integrations  are  carried  out  (e.g.  midpoint  rule,  trapezoidal,  etc).  The 
use  of  the  small  area  ABC D  around  point  P  (Fig.  2)  reduces  the  computational 
molecule  to  only  9  points.  The  above  discretization  is  also  conservative;  that  is  the 
viscous  fluxes  cancel  one  another  at  the  faces  of  the  primary  volumes.  For  example, 
the  flux  Fcb  =  («*)cs(yc  -  Vb)  of  the  volume  centered  at  P,  is  canceled  identically 
by  the  flux  Fbc  =  («*)cb(j/b  -  yc)  of  the  volume  centered  at  E. 

Another  class  of  methods  are  those  which  switch  to  finite  differences  when  evalu¬ 
ating  stresses,  instead  of  using  integrations  over  the  secondary  ceils  [2j.  For  example, 
the  finite-difference  value  of  (ux)cfl  on  a  cartesian  mesh,  is 


u  e  -  u  p 
X£  -  xp 

Generally  for  a  non-cartesian  mesh,  the  finite-difference  expressions  for  first  order 
derivatives  of  a  quantity  U  are  given  by  the  relations  (Fig.  3): 


dU 

AmU.Aty  -  AiU.Amy 

dx 

Amx.Aty  -  Atx.Amy 

dU  _ 

AiU.Amx  -  AmU.Aix 

dy 

Amx.Aiy  -  Atx. Amy 

where 


AmU  =  Ue  —  Uw,  AiU  =  Un  —  Us, 


etc.  The  approach  remains  conservative  since  the  second  integration  over  the  pri¬ 
mary  cells,  that  is  carried  out  to  obtain  second  order  derivatives,  is  conservative. 

So  far,  we  have  considered  variables  to  be  stored  at  the  centers  of  the  cells.  Still 
another  class  of  methods  stores  the  variables  at  cell  vertices  [4,9].  Again  primary 
and  secondary  control  areas  are  employed  in  order  to  compute  the  viscous  terms. 
One  such  method  will  be  described  in  some  detail  in  a  later  section. 

Viscous  terms  at  boundaries  require  information  from  outside  of  the  domain  in 
order  to  be  evaluated  (pseudo  cells)  because  the  boundaries  interrupt  the  primary 
and  secondary  cell  arrangement.  In  most  cases,  however,  viscous  terms  are  not 
required  at  boundaries  since  boundary  conditions  are  employed  there. 

SPATIAL  ACCURACY 


An  evaluation  of  viscous  terms  faces  more  severe  accuracy  problems  than  do 
inviscid  terms  which  consist  of  lower  order  derivatives.  The  higher  order  derivatives 


N 


Figure  3:  Coordinate  definition  for  non-cartesian  mesh 

require  a  broader  stencil  and  the  accuracy  is  more  sensitive  to  grid  quality  (stretch¬ 
ing,  skewness,  etc).  The  spatial  accuracy  of  viscous  terms  discretization  therefore, 
merits  special  attention. 

Two  main  operations  were  employed  in  the  previous  section  when  evaluating 
viscous  terms:  1)  line  integration  (with  a  trapezoidal  or  midpoint  rule),  and  2) 
averaging  to  obtain  values  of  variables  and  other  geometric  quantities  at  desired 
locations.  Formally,  both  operations  are  second  order  accurate.  However,  the  order 
depends  upon  both  the  grid  and  the  solution.  We  proceed  now  to  examine  various 
sources  of  error  which  can  reduce  the  accuracy  of  methods. 


Integrals  Evaluation 

Consider  the  one-dimensional  case  which  evaluates  a  quantity  u  at  point  A 
(uX)using  the  values  of  quantities  at  neighbouring  points  A+  ,A~  (Fig.  4). 


1  fA+  11,  ,  _ 

0/1  =  A 11  a-  UdS  =  AS 2(“a-  +  "x*)AS 
Taylor  expanding  uA~,  uA+  and  substituting  in  Eq.  (8)  we  obtain: 


UA  =  +  ~(AS+  ~  ^S~^~ds  +  4^5+)2  +  ^ S  + 

1  du 

u7  =  ua  +  -(AS+  —  AS  )—  +  ... 


(9) 
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Figure  4:  Nonuniform  mesh  definition 


Figure  5:  Sheared  mesh 


The  derivative  terms  on  the  right  hand  side  represent  the  error  that  is  made  when 
approximating  uj.  Without  stretching  (A S+  =  AS~),  an  evaluation  of  ua  is  second 
order  accurate,  but  reduces  to  first  order  when  stretching  is  present.  If  a  is  the 
stretching  ratio  (a  =  — ^-)  of  a  mesh  that  is  stretched  exponentially,  for  the  case  of 
a  stress  (first  order  derivative),  we  obtain: 


,du .  ,du .  1  ,d2u. 

^~ds^A  ~  +  2A 


(10) 


If  the  solution  is  linear  (u  ~  s),  then  [j^)a  =  0  and  despite  the  stretching  the 
evaluation  remains  second  order. 

Accuracy  is  also"  reduced  by  a  sheared  grid  (Fig.  5).  For  a  grid  with  kinks  at 
points  C  and  B,  the  line  integration  around  the  secondary  cell  A'B'C'D1  assumes 
that  straight  lines  connect  the  vertices  (i.e.  point  C  lies  on  a  line  passing  through 
points  jy  and  C'). 


A  more  accurate  approach  would  make  use  of 

-(u<7  +  U<;i)AS<7C'  +  n{uD'  +  Uc)A5’xj<c 

It  It 


for  the  line  integration.  However,  if  no  stretching  is  present  (AScc*  =  &Sp'c  = 
AS/ 2),  we  have  instead: 

1,  ,  AS  AS 

-(UC<  -h  «D')—  +  “C  — • 

Since 

«C'  =  ~(uffE  +  «£),  +  U/>) 

there  follows 

1.  .AS  AS 

-(u/>  +  ug  +  +  ujv)— —  +  uc- —  =  ucAS, 

4  /  2 

as  was  used  here.  The  way  in  which  the  averaging  was  defined  to  obtain  the  quan¬ 
tities  at  points  C,D ',  and  C'  eliminates  the  error  for  this  case. 

The  line  integration  around  A'  B'C'D1  is  carried  out  when  evaluating  (u*)cb  at 
the  middle  of  face  CB.  Recall  however,  that  Green’s  theorem  gives  the  uz  value 
at  the  centroid  of  A' B'C'D' ,  which  does  not  coincide  necessarily  with  the  middle  of 
face  CB,  but  introduces  an  error  of  higher  order  than  that  induced  by  stretching. 
Finally,  skewness  of  the  cell  does  not  affect  the  accuracy  of  a  line  integration. 

It  is  apparent  that  errors  due  to  stretching  are  the  most  serious;  however  Severn’ 
approaches  earn  be  adopted  to  compensate.  Usually  this  involves  weighting  factors 
based  on  the  grid  stretching  [14].  Criticism  for  such  approaches  includes  the  cost 
of  extra  computations,  the  treatment  at  boundaries  (either  flow  or  zonal),  extra 
storage  for  weighting  factors,  and  the  use  of  information  from  outside  of  each  cell, 
which  is  undesirable  when  unstructured  grids  are  present. 


Averaging 

Averaging  is  used  frequently  during  the  numerical  processing  of  viscous  terms. 
Simple  algebraic  averaging  is  formally  of  second  order  and  degrades  to  first  order 
when  stretching  is  present.  The  algebraic  averaging  when  obtaining  up  is  second 
order  only  if 

_ 

rp  =  —  (ri  +  r2  +  r3  -f-  r4) 

4 

where  r<  axe  the  position  vectors  of  the  corner  points  of  a  cell,  and  fp  is  the  cor¬ 
responding  position  vector  for  an  interior  point  P.  Area  weighted  averages  and 
interpolation  formulas  which  use  splines  can  be  employed.  Problems  associated 
with  such  approaches  are  similar  to  those  mentioned  above. 

As  was  mentioned,  other  than  grid  quality,  the  factor  which  most  influences  ac¬ 
curacy  is  the  form  of  the  solution  itself.  The  more  non-linear  the  solution,  the  larger 
the  errors.  For  a  typical  boundary  layer  profile,  the  outer  edge  region  is  sufficiently 


non-linear  to  cause  accuracy  problems,  in  comparison  to  the  inner  portion  of  the 
*  profile  which  is  relatively  linear. 

Another  factor  which  introduces  errors  is  artificial  viscosity.  The  order  of  the 
usual  second  and  fourth  order  smoothing  operators  on  smooth  grids  reduces  for  non- 
uniform  grids  as  we  will  see  in  a  later  section.  Lastly,  all  of  the  above  errors  scale 
with  the  mesh  size.  Both  the  integration  and  averaging  errors  are  more  significant 
in  the  regions  where  the  grid  is  coarse. 

Limitations  on  higher  order  numerical  modeling 

To  date,  there  have  been  very  few  finite  volume  schemes  for  Navier-Stokes  equa¬ 
tions  (especially  for  compressible  flows)  that  correspond  to  higher  them  second  order 
of  accuracy.  There  are  a  few  possible  reasons  for  that.  Higher  order  schemes  involve 
more  computations  which  may  outweigh  the  advantage  of  using  coarser  grids.  A 
larger  computational  molecule  usually  is  required  and  this  poses  problems  in  the 
boundary  treatment.  Higher  order  accuracy  often  leads  to  tighter  stability  restric¬ 
tions.  Lastly,  extra  storage  then  may  be  needed,  and  care  must  be  taken  to  account 
for  higher  than  second  order  errors.  These  can  still  be  of  lower  order  than  the 
method,  and  have  been  ignored  in  the  present  schemes  (e.g.  the  difference  between 
the  centroid  and  center  of  a  cell). 

VISCOUS  GRID 

The  issue  of  spatial  accuracy  due  to  viscous  term  discretization  has  been  shown 
to  depend  substantially  upon  the  grid  quality.  On  the  other  hand,  the  accuracy  of 
inviscid  term  evaluations  is  generally  much  more  insensitive  to  the  grid  properties 
(stretching,  shearing).  The  second  most  important  factor  determining  accuracy  is 
the  solution  itself.  The  larger  its  derivatives,  the  bigger  is  the  error.  Gradients  in 
viscous  regions  are  generally  much  larger  than  those  in  inviscid  regions  (excluding 
typical  discontinuities).  It  should  be  apparent  that  a  distinction  must  be  made 
between  viscous  and  inviscid  grids.  We  now  focus  on  the  specific  grid  requirements 
in  viscous  regions. 

Number  of  nodes  across  the  shear  layer 

Usually,  a  shear  layer  requires  0(10)  to  O(102)  points,  for  example,  a  number 
like  15  is  common  practice.  If  the  Reynolds  number  (Re)  is  small,  the  layer  is 
relatively  large,  and  the  spacing  between  grid  points  is  large  (coarse  grid).  A  larger 


Re,  implies  finer  viscous  grids.  Many  points  are  wasted  in  a  gradual  distribution 
from  viscous  to  outer  inviscid  regions,  given  the  stretching  restriction.  The  15  points 
in  the  case  of  a  laminar  layer  are  placed  in  a  region  with  thickness  6  ~  Re 5,  while 
in  a  turbulent  layer  6  ~  Re*.  Moreover,  about  two  points  are  needed  in  the  viscous 
sublayer  of  a  turbulent  shear  layer,  which  implies  that  the  spacing  at  the  wall  Aymtn 
should  be  such  that 


At/+  •  = 


_  «*+Ay 


rmm  0  4.  __ 

~  2  ,uT  = 


Approximating  the  wall  stress  r  with  ,  and  nondimensionalizing  Ayn 

with  a  reference  length  L,  results  in 


A  J/mm  ~  —  • 

Re 


Stretching 

Equation  (10)  shows  the  lowest  order  error  term  to  be  |AS(a  -  1)|^,  which 
should  be  much  smaller  than  the  first  term  jj.  In  the  linear  region  of  a  typical  shear 
layer  profile  0  is  relatively  small  and  a  can  be  much  bigger  than  the  stretching 
that  is  allowed  in  the  edge  region,  where  §*  is  larger.  Hopefully  the  above  term  is 
at  least  an  order  of  magnitude  smaller  than  the  first  term  u : 

1  A  c  /  d2u  1  ,du 

-AS (a  -  1)——  <  — ( — ). 

2  ’  ds2  10[dsJ 

Linearizing  the  derivative  and  assuming  Au  —  u,  implies  that: 

q  <  1.2 


This  suggests  that  the  maximum  allowable  stretching  (a  -  1)  is  20%.  This  has  been 
empirically  considered  to  be  an  upper  limit  for  grid  stretching,  although  in  many 
cases,  this  value  is  exceeded. 


Directionality  - 

Generally,  in  a  shear  layer  there  are  large  differences  in  magnitude  between 
gradients  in  streamwise  and  normal  directions.  I.e,  or  ^  ^  and  this 

implies  Ay  <.  Ax.  Such  direct'  '-.s.lity  in  the  solution  ‘imposes’  a  similar  constraint 
on  the  grid.  Viscous  cells  *  Uy  have  large  aspect  ratios  which  can  be  of  the 
order  of  100  in  many  cases.  The  d:,*ectionality  requirement  is  met  much  more  easily 
by  quadrilateral  rather  than  tim.i^ular  meshes.  This  is  a  dominant  reason  for  the 
absence  of  triangles  in  visco*-a  regions  computations. 


Figure  6:  Node  and  cell  designations  for  smoothing  operators 

Adaptive  grids 

The  resolution  and  stretching  requirements  impose  serious  problems  on  viscous 
grid  generation.  The  allowed  stretching  together  with  the  smallest  spacing  at  the 
wall,  result  in  excess  resolution  in  the  inviscid  region.  Also,  the  location  and  thick¬ 
ness  of  viscous  regions  are  not  known  a  priori.  This  results  in  considerable  empiri¬ 
cism  in  grid  construction  and  its  later  modification  upon  examining  the  solution. 
Adaptive  locally  embedded  viscous  grids  provide  local  refinement  using  an  adaptive 
algorithm  which  senses  the  viscous  regions  during  the  course  of  the  computation 
[4,5,3].  A  detailed  examination  of  such  methods  is  given  in  the  chapter  on  adapta¬ 
tion  methods  for  viscous  flows. 

ARTIFICIAL  DISSIPATION 

Smoothing  that  is  accomplished  by  explicitly  adding  dissipation  is  employed  by 
the  vast  majority  of  existing  schemes,  especially  those  concerned  with  compressible 
flows.  There  are  two  main  types  of  such  artificial  smoothing:  one  is  used  to  capture 
shocks,  the  second  is  designed  to  damp  spurious  oscillations  throughout  the  field 
and  to  suppress  odd-even  decoupling  of  the  solution.  These  second  and  fourth  order 
smoothing  operators  are  examined  in  detail  in  order  to  address  the  relevant  issues  of 
viscous  region  contamination  by  artificial  smoothing  and  the  degradation  of  order 
with  grid  stretching. 


Second  order  (shock)  smoothing 

Second  order  smoothing  provides  damping  necessary  to  smear  a  shock,  which 
ideally  has  zero  thickness,  in  such  a  way  that  oscillations  are  avoided.  Since  it  is 
required  only  in  shock  regions,  a  switch  is  employed  to  turn  it  off  elsewhere. 

In  two  dimensions,  the  damping  term  may  have  the  form: 


dP  du  !  dP  du 
dx  dx  +  dy  '  dy 


Consider  the  specific  discretization  for  a  cell- vertex  scheme  at  node  0  (Fig.  6). 

The  node  receives  contributions  from  each  of  the  four  surrounding  cells;  That 
from  cell  A  is: 

e(2)  _  +  Pi)  ~  {Pi  +  Po) 

0A~  P3  +  P5  +  P2  +  Po  ' 

[Pi  +  P3)  -  (Po  +  P5) 

Pz  +•  P3  +■  -Po  +  Ps 

Similarly,  from  cell  D  it  is: 

~(2)  _  [Pi  +  P4)  ~  [Pi  +  Po) 

00  Pi  +  P„  +  Pi  +  Po 
(A  +  Pi)  ~  (P4  +  Po) 

Pi  +  P2  +  P4  +  Po 

Similar  expressions  furnish  contributions  from  cells  C  and  B.  Pressure  differences 
in  the  switch  are  normalized  by  the  sum  of  the  pressures  at  the  four  corners  of 
each  cell.  The  sum  of  the  smoothing  distributions  to  the  four  nodes  of  each  cell  is 
zero,  which  implies  that  the  above  operator  is  conservative.  For  the  special  case  of 

=  |^,  the  computational  molecule  of  the  smoothing  operator  is  the  one  shown 
in  Fig.  7  (a). 

A  question  does  arise  with  respect  to  the  appropriate  order  of  the  pressure 
switch.  The  purpose  of  the  switch  is  to  act  on  only  large  gradients  (shocks)  and  not 
on  those  which  are  small.  The  larger  the  stencil  of  the  switch,  the  more  likely  it  is  to 
‘pick’  relatively  mild  gradients  which  do  not  actually  represent  shocks.  Moreover, 
a  higher  order  switch  is  computationally  expensive.  A  proof  follows  which  uses 
Burgers  equation  uu,  =  vuzx  applied  to  a  shock  region  of  thickness  6.  From  this 
relation,  it  follows  that  j-  ~  ^  =>•  v  ~  u.6,  which  implies  that  the  bigger  the 
smoothing  coefficient,  the  more  smeared  the  shock  region  will  be.  Continuing  from 
the  previous  relation,  u  —  Au.Ax  but  Au  ~  A P,  thus 


.{[til  +  U+)  —  (u2  +  u0)}  + 

•{(ui  +  ui)  ~  (u<  +  «o)}  (13) 


•{(U3  +  U5)  —  (u2  +  uo)}  + 

.{(u2  +  u3)  -  (u0  +  u5)}  (12) 


v  —  AP.Ax 


0  2 


(a)  Second  order  smoothing  operator  (b)  Laplacian  operator 

Figure  7:  Smoothing  stencils 

which  implies  that  the  pressure  switch  should  be  a  first  difference  in  pressure. 


Fourth  order  smoothing 

The  fourth  order  smoothing  that  is  used  away  from  shocks  to  suppress  odd- 
even  modes  and  damp  spurious  oscillations  is  turned  off  near  shocks  because  it  is 
destabilizing.  The  operator  is  formed  in  two  steps.  The  second  order  difference 
operator  is  formed  in  the  first  step  (Fig.  6): 

■DqA  =  «0  +  U2  +  U3  +  U5  -  4Uo 

Dig  =  U7  +  U0  +  U5  +  Ug  -  4u0 

Die  =  u6  +  u4  +  u0  +  U7  -  4uo 

Dqd  =  U4  +  Ui  +•  u2  +  u0  -  4uo 

Summing  up  contributions  from  the  four  cells  surrounding  node  0  results  in  a  Laplace 
stencil  (Fig.  7  (b)  ). 

The  second  stepr  duplicates  the  first,  replacing  state  variables  by  second  order 
differences  from  the  first  step. 

-D*a  =  Dl  +  D22+Dl  +  Dl-4D20 
-DlB  =  D27  +  Dl  +  Dl  +  Dl-ADl 
-Die  =  Dl+D\  +  D2Q  +  D27-  4D20 
-D*d  =  D\+D2  +D2  +  D2-iD2 


(15) 
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Figure  8:  Sawtooth  mode 

It  is  illustrative  to  study  how  the  smoothing  operator  acts  to  remove  an  odd-even 
mode  for  the  one-dimensional  case  shown  in  Fig.  8.  The  operator  in  one-dimensional 

form  is: 

—  Dq  =  u_2  —  4u_i  ■+■  6uo  —  4uj  +  U2,  (16) 

where  —2, —1,0, 1,2  are  neighbouring  poiuts  (Fig.  8).  Substitution  of  sawtooth 
mode  values  in  Eq.  (16)  shows  that  Dq  furnishes  such  a  contribution  to  node  0  as 
to  reduce  the  odd-even  mode  u0. 


Combined  Second  and  Fourth  order  smoothing 

The  combined  second  and  fourth  order  smoothing  operator  has  the  following 
form: 

Sm  =  5?  -  ^moi(0,<74  -  o2AP).D* ,  (17) 

where  S?,Df  are  the  second  and  fourth  order  operators  discussed  above,  o2,o\  are 
corresponding  smoothing  coefficients,  and  AP  is  the  pressure  switch.  Near  shocks, 
the  term  (72  A  P  dominates  over  <74  and  therefore,  mai(0,<74  —  cr2AP)  vanishes  and 
switches  off  the  fourth  order  smoothing. 

Numerical  experiments  have  been  carried  out  with  a  node-based  Navier-Stokes 
scheme  (described  in  the  next  section),  to  determine  optimum  values  for  the  smooth¬ 
ing  coefficients.  The  example  considered  flow  at  Mach  number  0.5  and  1.4  in  a 
channel  with  a  bump  and  a  65  x  33  grid.  Figures  9a, b  show  Cp  distributions  for 
the  subsonic  case  with  and  without  fourth  order  smoothing  only  and  it  is  clear  that 
even  a  very  small  amount  of  dissipation  suppresses  the  sawtooth  mode.  Figure  10 
demonstrates  contamination  of  the  boundary  layer  by  smoothing  via  wall  skin  fric¬ 
tion  coefficient  distributions  for  values  of  the  smoothing  coefficient  equal  to  0.004 


(a)  without  smoothing  (b)  with  smoothing  o4  =  0.0005 

Figure  9:  Effect  of  fourth  order  smoothing  on  odd-even  decoupling 

and  zero.  Similarly,  a  supersonic  case  was  used  to  study  the  second  order  smoothing 
coefficient.  Figures  lla,b  show  Cp  wall  distributions  with  fourth  order  smoothing 
coefficient  <r4  =  0.0005  and  shock  smoothing  coefficients  t?j  of  0.40  and  0.05.  Clearly 
the  value  of  0.40  smears  the  shock  excessively,  while  the  0.05  value  is  too  small 
and  pre-shock  oscillations  appear.  A  reasonable  choice  of  smoothing  coefficients  is: 
a-t  —  0.20  and  cr4  =  0.0004. 


Boundary  layer  contamination 

Smoothing  is  required  mainly  in  the  inviscid  regions;  very  little  is  needed  in 
viscous  regions.  The  basic  requirement  is  suppression  of  odd-even  decoupling.  The 
presence  of  physical  viscosity  does  not  suppress  such  decoupling  since  it  is  allowed  by 
the  viscous  terms  discretization,  as  was  shown  in  a  previous  section.  Nevertheless, 
the  presence  of  smoothing  in  viscous  regions  may  seriously  deteriorate  accuracy. 
The  way  in  which  smoothing  terms  affect  the  viscous  layer  solution  as  well  as  the 
resolution  requirements  to  avoid  the  error,  are  of  interest. 


The  second  and  fourth  order  derivatives  usually  take  large  values  within  the 
viscous  region  and  therefore  smoothing  contaminates  the  viscous  layer  more  than 
other  regions.  For  the  model  diffusion  equation  uj  =  uuvy ,  we  have: 


t£  '"f"  C 

ut  =  UUVV  +  CF£(J2SP(Ay)UVV 
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i/4(Ay)3u, 


vvvv > 


(18) 


where  the  last  two  terms  represent  first  and  third  order  errors  introduced  by  the 
two  smoothing  operators.  A  CFL  stability  condition  =  CFL)  eliminates 

the  time-step  A t  from  the  above  expression.  To  ensure  that  artificial  viscosity  is 
much  less  than  the  physical  viscosity:  ^^a26PAy  <  u,  which  yields 


CFL 


^  r  u  . 

cr2  <  —  ■  ■  (19) 

Re.6P.Ay 

in  which  Re  =  if  a P  is  negligible  across  the  shear  layer,  then  relatively 

large  values  of  <r2  can  be  used.  For  example,  for  the  case  CFL  =  1  ,/2e  =  106,SP  = 
0.1,  Ay  =  10-4,  it  is  obtained  that  02  •<  0.1. 

For  a  fixed  value  of  the  smoothing  coefficient  er2,  the  resolution  of  the  shear  layer 
should  be  such  that:  Q{  »  1  => 


„  6  Reg026P 

N  =  — —  >> 

Ay  CFL 


(20) 


N  being  the  number  of  grid  points  across  the  shear  layer  and  Reg  =  For  the 

case  of  CFL  =  l,  Reg  =  lO3,^^  =  0.1,  a2  =  0.1,  the  number  of  points  within  the 
layer  should  be  N  »  10  so  that  the  real  viscous  terms  dominates.  Reg  is  based  on 
|uj  4-  c,  and  therefore  in  nearly  incompressible  cases  it  can  take  very  large  values, 
which  implies  that  considerable  resolution  is  needed. 

For  a  fixed  value  of  the  smoothing  coefficient  1/4,  the  resolution  of  the  shear  layer 
should  be  such  that:  0(  m  7*”  - )  >>  1  => 


RegUi.  1 

N  »  - ) 3 . 

{CFL} 


(21) 


Comparison  with  relation  (20),  shows  that  this  resolution  requirement  is  much  less 
severe,  as  a  result  of  the  higher  order  of  this  operator.  For  example,  if  CFL  = 
l,  Reg  =  103,  i/4  =  0.001,  the  number  of  points  within  the  layer  N  »  1. 

Several  suggestions  have  been  made  in  order  to  avoid  a  deterioration  of  accuracy 
due  to  smoothing.  One  gradually  reduces  smoothing  on  approaching  the  wall  by 
reducing  the  values  of  the  smoothing  coefficients  [2].  However,  in  many  cases  the 


most  serious  contamination  of  the  boundary  layer  does  not  occur  at  the  wall  but 
in  regions  with  larger  uvv,uvvvv  gradients.  Another  method  employs  directional 
smoothing  by  applying  the  operator  to  the  streamwise  direction  only.  This  requires 
a  quadrilateral  mesh,  or  triangular  with  a  quadrilateral  structure  (e.g.  triangles 
which  are  formed  by  dividing  quadrilaterals  along  a  diagonal)  [16].  A  third  approach 
applies  resolution  in  the  viscous  regions  by  means  of  local  embedding  (cell  division), 
which  results  in  a  reduction  of  cell  size  and  therefore  a  reduction  in  the  magnitude 
of  the  smoothing  error.  Adaptive  algorithms  place  finer  grids  in  those  regions  with 
higher  gradients  where  smoothing  is  larger. 


Grid  stretching  increases  smoothing  error 

The  above  smoothing  operators  introduce  first  and  third  order  errors  only  for 
uniformly  spaced  grids.  The  actual  order  can  be  demonstrated  by  considering  a 
one-dimensional  stretched  grid.  A  second  order  smoothing  operator  without  the 
pressure  switch  has  the  form:  S*  =  ui  —  2uq  +  u_x. 

A  Taylor  series  expantion  about  point  0,  leads  to:  S2  =  (h+  —  h~)ux  +  |{(h+)J  + 
(/i~)2}u«  +  ....  Assumming  exponential  stretching,  and  with  a  =  jp,  the  operator 
becomes: 

Si  =  h'(ct  -  l)u«  -I-  i(h")2(a2  +  l)u«  -r  ...  (22) 

The  first  order  term  h~(a  —  l)u*  appears,  which  increases  the  smoothing  error  and 
makes  the  operator  dispersive  rather  than  dissipative.  Similarly,  the  fourth  order 
difference  operator  =  u_i  —  4u_i+6uo—  4ui  +  uj  has  the  form  for  an  exponentially 
stretched  mesh: 

D*  =  (a  —  2 ){a  -  1  )h~ux  +  2a(l  +  ^)(a2  +  1)(/T)2u«  +  ...  (23) 

Again,  the  first  order  term  (a  —  2)  (a  —  l)h~uz  increases  the  error  and  makes  smooth¬ 
ing  dispersive.  It  is  to  be  noted  that  exponentially  stretched  meshes  are  widely 
employed  for  viscous  computations.  The  dissipative  terms  in  Eqns.  (22), (23)  are 
positive  for  any  a  which  precludes  the  possibility  of  having  negative  damping  for 
some  a. 

AN  EXPLICIT  NODE-BASED  FINITE- VOLUME  SCHEME 

A  conservative  finite-volume  scheme  developed  for  the  Navier-Stokes  equations 
in  [4]  will  be  described.  It  is  an  explicit,  time-marching  scheme  with  the  state- 
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Figure  12:  Cell  node  and  face  designations 

variables  stored  at  grid  nodes.  All  necessary  operations  are  completed  within  the 
cell,  which  makes  the  scheme  suitable  for  unstructured  (locally  embedded)  meshes. 
The  discretization  of  the  inviscid  terms  will  be  presented,  then  the  treatment  of  the 
viscous  terms  will  be  described  in  detail,  and  finally  the  code  is  examined  in  terms 
of  computer  requirements  and  example  cases. 

Inviscid  terms 

A  one-step  Lax-Wendroff-type  integration  scheme  [10]  was  employed  for  treat¬ 
ment  of  the  convective  terms  of  the  Navier-Stokes  system.  The  corresponding  inte¬ 
gral  relation  over  a  cell  is  §-t  /  fceUarea  Udxdy  +  fceU  faeei{Fdy  -  Gdx )  =  0.  The  first 
term,  representing  the  change  in  time  of  U  over  the  cell  area  S,  is  discretized  as 
The  second  term,  representing  the  convective  fluxes  across  the  cell-faces,  is 
computed  via  the  trapezoidal  integration  rule.  Then  (Fig.  12) 

dUc  a  ,  Fsw  +  Fse  ,  %  Gsw  +  Gse  ,  \ 

__.S  +  - - - [ysw  -  ySE ) - - - (XSty  -  XSE) 

Fsw  +  Fsw  f  ,  Gsw  +  Gsw  ,  , 

+  - 2 - 'yNW  ~  !/sw ) - - - (a:Arw  -  xsw) 

Fse  +  Fsw  ,  ,  Gse  +  Gsw  ,  , 

+  - 2 - (vwe  -  yNw) - j - \XNE  ~  Xnw> 

+  FsE  ;  (u»  -VSE)-  GSE  G"S  (*ss  -  »„,)  (24) 

The  above  change  in  time  over  the  cell  (- jf--S )  must  be  distributed  to  its  four 
nodes  according  to: 


(25) 


where 


and 


(dU)sw  =  -{A Uc  ~  A fc  -  Age} 
( 6U)nw  —  -{AC/c  —  A/c  +  Age} 
{6U)se  =  -{A£/c  +  A/e  +  Age} 

4 

(6U)se  =  -{At/e  +  A/e  —  Age} 
4 


A/e  =  ——  (AFcAy1  —  AGcAz*) 
At , 

Age  —  — —  (AGeAzm  —  AFcAy"*) 


A*  -  <§£)iW,  AG  -  (f|)AP 


Ax*  =  0.5(Xtf»y  +  XfiE  —  Xsw  —  XSE ) 
Ay'  =  0.5(y^iv  +  yj vb  -  ysw  -  yss) 
Axm  =  0.5(X)V£  -f  XSE  —  X)VW  —  Zsw) 
Aym  =  0.5(y^rjs;  +  yss:  -  yjvw  -  ysw)- 


Here  A  Uc  is  the  sum  of  the  flux  terms  in  Eq.(24). 


Viscous  terms 

The  viscous  part  of  the  integral  Navier-Stokes  equations  is 

/  /  ~^dxdy  ~  f[Rdy  ~  Sdx)  (26) 

and  is  discretized  by  using  two  different  cells.  One,  the  secondary  cell,  is  used  to 
evaluate  stresses  (first  order  derivatives),  and  another,  the  primary  cell,  is  employed 
to  calculate  the  viscous  terms  (second  order  derivatives). 

As  was  mentioned,  a  main  concern  when  discretizing  viscous  terms  is  to  keep 
the  stencil  small.  Figure  13  shows  the  primary  cell  ABCD  that  is  used  to  compute 
viscous  terms  at  node  O. 

The  unsteady  term  /  /  ^fcdxdy  is  discretized  as  fjJjUdxdy  =  jjUo-S  = 
where  S  is  the  area  of  the  primary  cell  ABCD.  Therefore,  the  entire  integral  equa¬ 
tion  takes  the  discrete  form  over  ABC  D: 


AU  =  —  f  (Rdy  -  Sdx ) 

S  Jabcd 

At 

=  -£-■{  +  RbA-^VBA  -  Sba-&XBA 
+  Rcb-^Vcb  —  Scb-&xcb 
+  Rdc-^Vdc  —  Sdc-Axdc 
+  Rad -Ay  ad  ~  Sad.Axad}  (27) 

The  terms  Rba >  Sba,  RcB,ScB,ctc  are  stress  and  heat  conduction  terms  that  should 
be  evaluated  at  cell-faces.  The  secondary  cell  (defined  by  the  points  BA,  B',  C',  DC, 
Fig.  13)  is  employed  for  the  evaluation  of  first  order  derivatives.  A  typical  derivative, 
j~,  is  evaluated  as  follows: 

dU  1  f  f  dU  1  r  ITJ 
li  =  S^JJteiS=S^tl!dy  = 

=  7T — -{UB{yB‘ -  yBA)  +  UE[yc  -  j/bO  + 

$CB 

uc[yDc  -  yc>)  +  Uo{yBA~yDc)}  (28) 

This  viscous  terms  treatment  is  second  order  accurate  in  space,  and  first  order 
accurate  in  time  for  uniform  meshes.  Stretching  and  skewness  introduce  errors  when 
evaluating  line  integrals  as  was  discussed.  On  a  cartesian  mesh,  the  above  spatial 
discretization  reduces  to  classical  central  differencing.  The  choice  of  secondary  cells 
plays  a  crucial  role  in  the  behaviour  of  the  scheme  for  odd-even  decoupling  of  the 
solution.  The  above  choice  of  asymmetric  overlapping  secondary  cells  has  the  effect 
of  suppressing  the  one  and  two-directional  odd-even  modes  shown  in  Fig.  14,  where 
+  and  -  indicate  deviations  from  a  mean. 

In  the  first  case  the  integration  makes  a  positive  contribution  to  node  O,  while 
the  second  provides  a  negative  contribution.  Therefore,  in  both  types  of  odd-even 
modes,  the  scheme  tends  to  suppress.  Instead  of  a  staggered  secondary  cell,  another 
possibility  would  use  averaging  in  order  to  obtain  the  face  stresses.  For  the  CB 
node,  the  averaging  Rcb  =  j(^c  +  Rb)  can  be  used.  In  this  case,  the  grid  cells 
B,C  are  used  as  secondary  cells  in  order  to  evaluate  Rb,Rc ■  However,  this  method 
allows  the  two-directional  odd-even  mode  to  exist,  since  it  contributes  zero  to  node 
O,  and  thus  does  not  cancel  the  sawtooth  mode. 

Equations  (27), (28)  can  be  manipulated  in  such  a  way  that  all  necessary  opera¬ 
tions  can  be  performed  within  each  grid  cell  without  using  any  outside  information. 
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Figure  13:  Viscous  terms  discretization 


The  terms  in  (27)  are  split  as  follows: 

At 

At/  =  ~jf'i  +  +  Ra)  &Vba  ~  (5 'b  +  S’xJ.Axba 

+■  [Re  •+•  Rg)-&ycB  —  (Sc  +  Sb)-Azcb 
+  ( Rd  +  Rc)-^Vdc  —  [Sd  +  Sc)-Axpc 
+  ( Ra  +  Rd)-^Vad  —  (Sa  +  Sc).Axad  }  (29) 

Each  of  the  terms  (e.g.  RB,  Sb,  Ra,  S*,e tc)  contains  information  only  from  cells 
B,  A  respectively.  It  is  assumed  that  the  term  ^  is  the  same  for  the  cells  A,  B ,  C,  D. 
Also,  A ysA  =  ye  -  Va  ys*  -  Vba  ysA  ~  I/a'-  For  a  stretched  mesh,  the  above 
assumptions  introduce  errors  which  are  of  the  same  order  as  those  introduced  by 
the  original  line  integrations  around  the  primary  and  secondary  cells.  The  following 
example  illustrates  how  the  stress  and  heat  conduction  terms  R,  S  are  split,  for  the 
case  of  the  derivative  (u*)cb-  One  obtains: 

(Ux)cB  = 

— — -{Usiyc  —  i/b')  +  Uc{vdc  —  yc )  +  Uo{yBA  —  ype)  +  & s(ys •  ~  j/ba)} 
bcB 

=  -^-.{I/e(yc'  -  ys)  +  Uciyoc  -  yc<)  +  U0{yo  -  ync)}  + 
be 

—■{U0{yBA  ~  yo )  +  UB(yB'  -  ysA)  +  UB{yE  -  ys')}  (30) 

bB 


where  Scb  53  Sc  »  SB. 


(a)  one  direction  mode  (b)  two  directions  mode 

Figure  14:  Suppression  of  odd-even  modes 


Equation  (29)  gives  the  contributions  to  node  O  by  its  surrounding  cells  A,  B ,  C,  D. 
The  contributions  of  each  cell  to  its  four  corners  (Fig.  12)  follow  from  equation  (29): 

( AU)sw  =  +  Ay1  —  /2jvAym)  —  (  +  S5Ax<  —  Sw  Aim)} 

(A{7)tf(v  =  —  +  Rw  Aym)  —  {+Ss  Ai1  +  Sw  Ax"1)} 

(A  U)NE  =  ^.{(-^Ay' +i?BAym)-(-5^Ax‘  +  5£Axm)} 

(AC/)**  =  ^.{(-flsAy1  -  Re Ay"1)  -  (-5sAx'  -  S*Axm)}  (31) 

where  Rs  is  the  part  of  the  split  stress  which  corresponds  to  the  south  face  of  the 
cell  and  similarly  for  the  other  subscripts  N,W,  S.  The  above  contributions  to  the 
four  corners  sum  up  to  zero,  which  implies  that  the  scheme  is  conservative. 

Smoothing  is  required  for  the  inviscid  region,  and  the  operators  described  in  the 
previous  section  were  applied.  Ni’s  inviscid  multigrid  operator  [10]  was  also  applied 
to  accelerate  the  wave  propagation  and  convergence  in  the  inviscid  region,  but  it  is 
not  really  effective  in  a  shear  layer.  There,  the  solution  advances  to  steady-state  at 
a  slower  rate  due  to- small  time-steps,  as  is  evident  in  the  later  iterations  in  Fig.  15. 
Initially,  there  is  an  acceleration  towards  convergence  within  the  inviscid  region, 
followed  by  the  largest  errors  persisting  within  the  boundary  layer  region.  A  typical 
acceleration  factor  is  approximately  five. 

The  solver  requires  storage  of  both  state  variables  and  their  changes  in  time 
(517)  at  each  node,  as  well  as  viscosity  and  time-step  values.  The  code  takes  0.006 
seconds  per  node  per  iteration  in  CPU  time  on  a  Vax/750  computer  and  has  been 


0. 


iter 

Figure  15:  Convergence  history  with  multigrid 

vectorized  for  the  Alliant/Fx8  mini-supercomputer  to  attain  a  speed-up  factor  of 
about  25. 

Example  cases 

Some  examples  of  computations  with  the  previously  described  scheme  follow. 

The  first  case  is  a  10%  circular  arc  cascade  in  a  subsonic  flow  [4]  of  M =  0.5 
and  Re  =  8  x  103  (Fig.  16(a))  with  a  65  x  33  grid.  The  C/  curve  is  compared  with 
2,1]  in  Fig.  16(b).  The  second  case  is  a  8%  circular  arc  cascade  with  Moo  =  1.4  and 
Re  =  23  x  10s  [4].  Figure  17  illustrates  the  flow  field.  An  oblique  shock  forms  at  the 
leading  edge  and  is  reflected  at  the  upper  symmetry  boundary.  The  reflected  shock 
then  interacts  with  the  boundary  layer  at  the  trailing  edge  region,  which  separates 
and  reattaches  downstream. 

Another  case  is  a  RAE  2822  airfoil  in  transonic  flow  of  Moo  =  0.75  and  Re  = 
6.2  x  10®  and  2.70®  angle  of  attack  with  a  C-grid  of  129  x  49.  Fig.  18  shows  Mach 
number  contours  indicating  a  normal  shock  at  the  suction  side  which  interacts  with 
the  boundary  layer.  The  case  of  a  NACA  0012  airfoil  in  subsonic  flow  of  Moo  =  0-5 
with  Re  =  2.91  x  106  and  1.77°  angle  of  attack  together  with  a  comparison  to 
experiment,  is  presented  in  the  chapter  on  adaptation  methods  for  viscous  flows. 
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CONCLUDING  REMARKS 


We  have  considered  the  finite-volume  approach  and  its  use  to  solve  the  Navier- 
Stokes  equations  with  an  emphasis  on  discretization  of  the  viscous  terms.  The  use 
of  secondary  cells  in  order  to  reduce  the  stencil  and  to  ensure  a  compact  scheme  is 
the  most  common  approach.  The  choice  of  the  cell  arrangement  for  the  evaluation 
of  derivatives  is  crucial  for  the  suppression  of  odd-even  modes  and,  therefore,  for 
the  enhancement  of  convergence  and  reduction  of  artificial  dissipation.  Grid  quality 
challenges  the  accuracy  of  most  of  the  current  schemes,  especially  in  viscous  regions. 
There,  the  grid  should  be  constructed  carefully  with  respect  to  the  maximum  al¬ 
lowed  stretching  and  resolution.  Adaptive  grids  can  offer  flexibility  in  viscous  grid 
generation.  The  use  of  artificial  dissipation,  especially  on  stretched  grids,  can  dete¬ 
riorate  the  accuracy  of  viscous  solutions,  if  care  is  not  taken  in  reducing  it  in  shear 
layer  regions.  The  current  Navier-Stokes  codes  have  been  successful  in  predicting 
quite  complicated  flows  such  as  those  involving  shock  boundary  layer  interactions. 
However,  their  relative  low  order  of  accuracy  requires  excess  resolution  and  the 
number  of  required  operations  per  grid  node  is  relatively  large,  resulting  in  quite 
expensive  computations.  Adaptive  algorithms,  which  use  grids  and  equations  that 
recognize  and  follow  the  flow  physics  during  a  computation,  seem  to  be  promising 
for  future  routine  use  of  current  finite-volume  Navier-Stokes  schemes  in  engineering 
applications. 
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Abstract 

An  adaptive  algorithm  for  turbulent  flows,  which  has  been 
developed  recently,  is  applied  to  airfoil  flow  fields  for  rela¬ 
tively  high  Reynolds  numbers.  The  adaptive  algorithm  em¬ 
ploys  both  grid  embedding  and  redistribution,  as  well  as 
equation  adaptation,  in  order  to  compute  viscous  flows. 

Two  kinds  of  geometries  are  considered.  The  first  in¬ 
volves  a  single  element  NACA  0013  airfoil;  the  second  is  a 
two-element  NLR  airfoil,  consisting  of  a  main  airfoil  and  a 
flap.  Specifically,  the  NACA  0013  airfoil  is  considered  for 
both  subsonic  and  transonic  flow.  The  flow  past  the  two- 
element  airfoil  configuration  is  low  subsonic  and  considers 
two  flap  deflection  angles.  The  numerical  results  are  com¬ 
pared  with  corresponding  NLR  experimental  measurements. 
Important  flow  physics,  such  as  shock-boundary  layer  inter¬ 
actions  and  small  separation  bubbles,  are  'captured*  by  the 
adaptive  algorithm  with  considerable  detalL 

INTRODUCTION 

In  recent  years,  considerable  progress  has  been  made  in 
the  development  of  numerical  methods  for  the  solution  of 
the  Navier-Stokes  equations.  Most  of  those  methods  how¬ 
ever,  are  not  practical  for  the  calculation  of  complicated 
flows  in  a  design  environment.  The  primary  reason  is  that 
the  efficiency  of  current  algorithms  is  poor  and  makes  it 
difficult  to  obtain  accurate  results. Very  fine  resolution  is 
needed, which  results  in  long  computation  times  even  with 
the  use  of  available  supercomputers. 
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In  general  the  selection  of  the  equations  which  are  to  be 
solved,  of  the  scheme  aad  of  the  grid  are  determined  a  pri¬ 
ori  by  the  user,  and  quite  often  some  or  all  of  the  above 
factors  must  be  modified  by  the  user  in  order  to  improve 
the  results.  The  robustness  of  current  numerical  schemes 
as  well  as  present  computer  capabilities  have  recently  al¬ 
lowed  a  dramatic  change  in  this  philosophy.  General  algo¬ 
rithms  have  been  developed  which  are  flexible  enough  to 
adaptively  adjust  the  equations  and  the  grid  during  the  so¬ 
lution  procedure  without  intervention  by  the  user  [3,0,7,10), 
aad  [12,15,17). 

Such  an  adaptive  algorithm  has  been  developed  in  [10], 
and  is  applied  to  turbulent  flows  around  single  and  two- 
element  airfoils  at  relatively  high  Reynolds  numbers.  First, 
the  cases  of  a  NACA  0013  airfoil  in  both  subsonic  and  tran¬ 
sonic  flow  are  presented.  Then,  flows  around  a  two-element 
NLR  airfoil  are  considered.  Experimental  measurements 
[16,18]  are  employed  In  order  to  evaluate  accuracy  of  the 
algorithm.  Important  flow  physics  are  ‘captured*  by  the 
algorithm  in  considerable  detail. 

ADAPTIVE  ALGORITHM 

jjsmtrial 

The  two-dimensional  Reynolds-averaged  Navier-Stokes 
equations  are  employed.  An  explicit,  finite  volume  Lax- 
Wendroff-type  numerical  scheme  which  was  developed  pre¬ 
viously  by  N1  [id]  for  the  Euler  equations  was  used  for  dis¬ 
cretisation  of  the  convective  terms. 

The  above  scheme  has  been  extended  to  include  viscous 
terms  as  well  (ll|  and  is  conservative.  In  order  to  acceler¬ 
ate  convergence  to  the  steady  state  a  multiple  grid  method 
[14]  which  acts  only  on  the  convective  terms  was  used.  Its 
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function  is  to  accelerate  the  propagation  of  waves  by  using 
coarser  than  basic  grids.  Odd-even  modes  were  suppressed 
in  the  essentially  inviscid  portion  of  the  flow  with  the  aid 
of  a  fourth  order  smoothing  operator,  while  shocks  are  cap¬ 
tured  using  a  second  order  Laplacian  smoothing  operator. 

An  important  property  of  the  above  scheme  is  that  all  op¬ 
erations  can  be  performed  within  each  cell  without  the  need 
for  any  external  information.  This  is  very  useful  in  dealing 
with  unstructured  grids.  A  more  detailed  description  of  the 
integration  scheme  as  well  as  investigation  of  its  properties 
can  be  found  in  (lOj. 


Adaptation 


The  procedure  begins  with  an  initially  coarse  grid  which 
is  embedded  in  regions  with  large  flow-gradients  (e.g. 
boundary-layers,  shocks,  wakes  etc).  The  algorithm  senses 
high  gradient  regions  and  automatically  divides  grid-cells  in 
such  regions. 

The  method  of  grid  redistribution  is  employed  together 
with  adaptive  grid  embedding  in  order  to  yield  a  more  flex¬ 
ible  algorithm  for  viscous  flow  computations.  The  use  of 
redistribution  can  be  advantageous  in  those  cases  where 
the  number  of  nodes  is  sufficient,  and  the  grid  rearrange¬ 
ment  is  not  so  severe  as  to  result  in  a  skewed  and  stretched 
mesh.  Redistribution  has  been  used  here  in  order  to  in¬ 
crease  grid  clustering  does  to  the  surface  for  airfoil  flows  at 
high  (0(10*))  Reynolds  numbers.  These  flows  typically  re¬ 
quire  the  grid  spacing  normal  to  the  wall  to  be  of  order  10~* 
chord  lengths  for  airfoil  problems.  As  a  consequence,  a  large 
number  of  grid  embedding  levels  la  required  to  decrease  wail 
normal  spacing,  which  can  result  in  excessive  resolution  in 
regions  away  from  the  wall.  Another  important  function  of 
grid  redistribution  is  its  facility  for  better  alignment  of  the 
grid  with  flow  features.  A  choice  that  must  be  made  before 
the  adaptive  procedure  starts  is  the  initial  grid  that  will  be 
employed.  Inclusion  of  redistribution  in  the  adaptive  algo¬ 
rithm  makes  the  procedure  much  more  flexible  and  effective 
in  accomplishing  grid  Kale  changes. 

The  Navier-Stokee  equations  apply  for  most  flow  fields  of 
engineering  interest.  Frequently  however,  not  ail  of  its  terms 
are  necessary  to  model  flow  physics.  The  viKous  terms  in 
fact  are  expensive  to  compute  but  often  are  negligible  over 
large  parts  of  the  domain.  The  algorithm  employs  the  mag¬ 
nitude  of  the  viscous  terms  as  a  criterion  in  order  to  decide 
where  the  full  Navier-Stokee  system  is  required  and  where 
a  subset  system  (e.g.  the  Euler  equations)  would  be  ade¬ 
quate.  The  border  between  two  such  regions  is  dynamically 
defined  by  the  algorithm  and  may  change  during  the  course 
of  the  solution  procedure. 


Turbulence  model 

The  algebraic  model  due  to  Baldwin  and  Lomax  |2|  was 
used  as  a  turbulent  flow  description.  That  model  implicitly 
assumes  a  structured  mesh,  and  its  implementation  is  usu¬ 
ally  along  lines  normal  to  the  surface.  Unfortunately,  for  an 
unstructured  mesh  (quadrilateral  or  triangular),  such  nor¬ 
mal  mesh  lines  generally  do  not  exist.  Generally,  interfaces 
interrupt  such  lines. 

Our  approach  implements  the  model  in  a  'cell-wise’  man¬ 
ner.  All  necessary  quantities  are  calculated  at  the  cen¬ 
ter  of  each  cell.  In  this  way  we  avoid  using  information 
from  outside  of  the  ceil,  an  approach  which  is  common 
when  dealing  with  unstructured  meshes  generally.  For  ex¬ 
ample,  vorticity  which  is  an  important  quantity  for  both 
the  inner  and  outer  layer  formulation  of  the  model,  is  cal¬ 
culated  using  Green’s  theorem  over  each  celL  Specifically, 
w  =  -(l/5„ij)/t<JJ(ttdx  +  vdf)  where  3..<i  is  the  cell  area. 
The  distance  of  each  cell  from  the  wall  is  calculated  and 
stored  whenever  the  grid  is  updated.  The  only  quantities 
that  require  information  from  ontside  of  each  cell  in  order 
to  be  evaluated  are  the  Baldwin-Lomax  parameters  Fmax 
and  Udiff  which  are  used  for  the  outer  layer.  In  order  to 
evaluate  the  variables  which  characterise  the  entire  shear 
layer  profile  at  each  streamwise  location,  the  cells  are  ar¬ 
ranged  in  streamwise  stations.  The  stations  consist  of  cells 
from  the  initial  mesh  plus  those  cells  that  are  introduced  by 
embedding. 

APPLICATIONS  TO  AIRFOIL  FLOWS 

The  adaptive  algorithm  is  applied  to  airfoil  flow  fields  for 
relatively  high  Reynolds  numbers.  Two  kinds  of  geometries 
are  considered.  The  first  involves  a  single  element  NACA 
0012  airfoil;  the  second  is  a  two-element  NLR  airfoil,  con¬ 
sisting  of  a  main  airfoil  with  a  flap. 

Specifically,  the  NACA  0012  airfoil  is  considered  for  both 
subsonic  and  transonic  flow  and  comparisons  are  made  with 
experiment.  The  Sow  past  the  two-element  airfoil  configu¬ 
ration  is  low  subsonic  and  considers  two  flap  deflection  an¬ 
gles.  The  numerical  results  are  compared  with  correspond¬ 
ing  NLR  experimental  measurements  for  the  case  of  20  de¬ 
grees  flap  deflection. 

Single  Airfoil  Fields 

Two  cases  of  adaptive  numerical  results  for  flow  around  a 
NACA  0012  airfoil  under  both  subsonic  and  transonic  condi¬ 
tions  are  presented  along  with  experimental  results  obtained 
by  an  AGARD  group  (10).  Details  of  the  flow  fields  also  are 
presented  to  demonstrate  the  range  of  capabilities  of  the 
algorithm.  The  reported  CPU-times  refer  to  an  ALLLANT 
FX/8  computer  with  three  processors.  A  speed  up  factor  of 
approximately  20  compared  to  a  micro Vax  computer,  was 
attained. 
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NACA  0012  (aubaouie) 


computer  memory.  The  case  took  5000  iterations  to  con¬ 
verge,  and  the  consumed  computing  time  was  3.5  hour*. 


The  subsonic  flow  conditions  were:  .Vf«,  —  0.50 ,Rt  = 
2.91  x  10*|  a  =  1.77®,  where  both  angle  of  attack  and  Mach 
number  values  are  thoss  suggested  in  [16]  to  take  into  ac¬ 
count  for  wind  tunnel  wail  effects. 

An  initial  C-mesh  of  33x17  points  was  employed,  with 
two  levels  of  embedding  resulting  in  a  Anal  number  of  5225 
cells  within  the  domain.  The  minimum  grid  normal  spac¬ 
ing  at  the  airfoil  leading  edge  was  9  x  10~*  chord  lengths, 
while  that  for  the  trailing  edge  region  is  9  x  10"*.  The  spac¬ 
ing  in  the  streamwise  direction  at  the  leading  edge  region 
was  0.002,  while  the  corresponding  spacing  at  the  trailing 
edge  is  0.020.  Figure  1  illustrates  the  embedded  grid  that 
was  employed.  The  case  took  4000  iterations  to  converge 
(reduction  of  residual  magnitude  by  three  orders]  and  re¬ 
quired  1.8  hours.  The  resulting  flow  field  is  depicted  in 
Figure  2  in  terms  of  Mach  number  contour  plots.  The  two 
boundary  layers  thicken  considerably  as  the  trailing  edge  is 
approached,  but  do  remain  attached  to  the  surface. 

The  experiment  [16]  provided  pressure  distribution  data, 
which  are  compared  with  the  numerical  results  in  Fig.  3;  the 
comparison  shows  very  good  agreement  between  numerics 
and  measurements.  The  computed  Ct  of  0.192  compares 
very  well  with  the  experimental  value  of  0.195. 

NACA  0012  (transonic) 

The  transonic  flow  conditions  were:  Af^  =  0.754,  R*  » 
3.76  x  10* ,  a  =  3.02®.  Again  the  angle  of  attack  and  Mach 
number  values  are  those  suggested  in  [16]  to  account  for 
wind  tunnel  wall  effects.  An  initial  C-mesh  of  65x41  points 
is  applied  with  the  farfleld  boundary  placed  at  15  chord 
lengths  away  from  the  airfoil.  Throe  levels  of  embedding 
are  introduced  by  the  algorithm  (with  the  third  level  being 
directional)  and  results  in  the  final  grid  illustrated  in  Fig.  4. 
The  Anal  number  of  cells  within  the  domain  is  40440.  The 
minimum  grid  normal  spacing  at  the  airfoil  loading  edge  is 
2  x  10~*  chord  lengths,  while  the  spacing  at  the  trailing 
edge  region  is  2  x  10~*.  The  spacing  in  the  streamwise 
direction  at  the  leading  edge  region  is  3  x  10“*,  while  the 
corresponding  spacing  at  the  trailing  edge  Is  0.004. 

Adaptive  redistribution  of  the  initial  mesh  was  applied  as 
depicted  in  Fig.  5.  The  figure  shows  both  the  solution  just 
before  redistribution  and  the  resulting  redistributed  grid.  It 
is  observed  that  wall  clustering  at  the  leading  edge  region 
and  at  the  airfoil  pressure  side  is  increased.  Conversely, 
points  are  moved  away  from  the  wall  at  and  downstream 
of  the  shock-boundary  layer  interaction  region,  since  the 
boundary  layer  thickens  considerably  and  significant  flow 
gradients  exist  away  from  the  surface. 

Allowance  of  directional  grid  embedding  [11]  at  the  third 
level  reeulted  in  a  reduction  of  the  number  of  cells  by  17290, 
which  represents  a  significant  saving  in  both  CPU-time  and 


Figure  6  illustrates  the  flow  field  in  terms  of  Mach  number 
contours.  A  shock  forms  on  the  suction  side  at  40%  of  the 
chord,  with  the  Mach  number  just  upstream  of  the  shock 
being  1.31.  The  boundary  layer  on  the  suction  side  of  the 
airfoil  starts  to  thicken  upstream  of  the  shock  and  separates 
at  X  =  0.82  close  to  the  trailing  edge.  On  the  other  hand, 
the  pressure  side  boundary  layer  is  considerably  thinner  and 
remains  attached  to  the  surface.  The  wiggles  that  are  ob¬ 
served  just  upstream  of  the  shock  are  odd-even  modes.  They 
exist  due  to  the  low  values  of  artificial  viscosity  that  were 
used  so  that  the  solution  within  the  viscous  region  does  not 
become  contaminated.  Such  oscillations  do  not  induce  inac¬ 
curacies  in  the  solution  since  the  shock  location  is  predicted 
accurately. 

Interesting  flow  physics  is  revealed  in  the  view  of  the 
shock- boundary  layer  interaction  region  provided  by  Fig.  7. 
The  severe  adverse  pressure  gradient  that  is  induced  by  the 
normal  shock  causes  the  boundary  layer  to  thicken  consid¬ 
erably  and  eventually  to  separate  at  the  foot  of  the  normal 
shock.  A  separation  bubble  is  formed  and  it  is  captured 
in  detail  by  the  adaptive  algorithm.  The  boundary  layer 
separatee  at  X  =  0.36  and  reattaches  at  X  =  0.52. 

Next  consider  the  grids  that  are  created  by  the  adap¬ 
tive  procedure  in  the  above  studied  regions.  It  is  difficult 
to  portray  on  the  same  plot  both  inviscid  and  viscous  re¬ 
gion  grids  due  to  the  very  different  cell-scales.  In  the  fol¬ 
lowing  plots,  the  boundary  layer  regions  are  not  enlarged 
(dark  regions  in  the  figures)  but  the  kinds  of  viscous  grids 
will  be  described.  The  regions  with  different  directionality 
grids  can  be  noticed  more  easily  by  observing  the  borders 
between  such  grids.  Fig.  8(a)  focuses  on  the  leading  edge 
region  grid.  There  is  directional  embedding  with  the  ceils 
being  divided  in  their  streamwise  direction.  The  grid  at  the 
shock-boundary  layer  Interaction  region  (Fig.  8(b))  follows 
the  local  flow  physics  in  an  accurate  manner.  The  upper 
'inviscid*  part  of  the  shock  is  ‘captured*  by  directional  em¬ 
bedding  with  ceils  being  divided  along  their  streamwise  di¬ 
rection.  Conversely,  the  boundary  layer  ahead  of  the  shock 
is  resolved  with  directional  division  of  cells  along  the  normal 
to  the  surface  direction  (the  dark  region  of  the  plot  between 
X=0.2  and  X=0.3).  As  the  shock  is  approached,  significant 
streamwise  gradients  are  induced  in  the  boundary  layer  and 
now  cells  are  divided  in  both  directions  (between  X=0.3 
and  X=0.5).  Downstream  of  the  shock,  the  boundary  layer 
cells  are  again  divided  along  the  normal  direction  only,  since 
there  are  no  appreciable  streamwise  flow  gradients  (between 
X=0.5  and  X=0.7).  The  streamwise  gradient  becomes  ap¬ 
preciable  again  ahead  of  and  at  separation,  which  results 
in  division  of  the  boundary  layer  cells  in  both  directions 
(region  between  X*0.7  and  X=1.0). 

The  accuracy  of  the  procedure  may  be  examined  by  com¬ 
paring  the  experimental  pressure  coefficient  wall  distribu¬ 
tion  with  the  corresponding  numerical  result  (Fig.  9).  The 
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shock  location  is  predicted  accurately  although  it  is  a  little 
more  smeared.  A  fourth  level  of  embedding  that  would  pro¬ 
vide  a  more  ‘crisp’  shock,  was  not  allowed  due  to  computer 
limitations.  The  agreement  remains  good  downstream  of 
the  shock.  However,  as  the  trailing  edge  is  approached,  the 
boundary  layer  does  not  resist  the  adverse  pressure  gradi¬ 
ent  and  separates  causing  the  pressure  distribution  to  tend 
to  level  out.  Such  trailing  edge  separation  is  not  observed 
in  the  experiment.  The  algebraic  turbulence  model  that  is 
employed  is  believed  to  be  largely  responsible  for  this  be¬ 
haviour  as  has  been  concluded  by  comparative  studies  of 
different  turbulence  models  for  transonic  airfoils  [5,9],  The 
pressures  on  the  pressure  and  suction  tides  match  at  the 
trailing  edge  and  the  somewhat  lower  pressure  level  at  the 
suction  side  influences  the  pressure  side  distribution  eauv 
ing  it  to  deviate  slightly  from  the  experimental  results.  The 
deviation  is  approximately  the  tame  over  most  of  the  pres¬ 
sure  surface.  Unfortunately,  corresponding  measurements 
for  skin-friction  were  not  performed. 


Two- Element  Airfoil  Fields 

To  date,  virtually  all  numerical  results  for  multi-element 
airfoils  originate  from  panel  methods,  Euler  computations 
[1,4,13,19,20],  and  viscous- inviecid  interaction  schemes  [8|. 
However,  there  are  eases  for  which  the  full  Navler-Stokee 
equations  are  a  necessity  in  order  to  describe  important 
flow  physics.  The  present  adaptive  algorithm  appears  to 
be  promising  for  such  computations.  The  use  of  quadrilat¬ 
eral  meshes  in  the  past  has  yielded  quite  awkward  grids,  and 
therefore  the  introduction  of  adaptation  is  of  some  Interest 
for  better  mesh  topologies.  Finally,  the  use  of  quadrilat¬ 
eral  meshes,  in  contrast,  for  example,  to  a  triangular  mesh 
provides  a  teat  of  their  suitability  for  complex  geometries 
computations. 

The  basic  airfoil  section  is  a  NLR  7301  airfoil.  The  flap 
chord  is  0.32c,  where  c  is  the  main  airfoil  chord  (Fig.  10). 
The  overlap  region  between  the  main  airfoil  and  the  flap  is 
0.053c,  and  a  gap  width  of  0.026e  was  considered.  During 
the  experiment  only  a  single  flap  deflection  angle  S  of  20 
degrees  down  had  been  considered  and  this  was  duplicated 
for  numerical  simulation.  Lastly,  the  flow  field  in  the  case  of 
an  undeflected  flap  angle  of  0*  was  eimulated  numerically, 
and  revealed  quite  interesting  flow  physics.  However,  no 
experimental  results  are  available  for  the  latter  case.  The 
two-element  configuration,  with  the  two  different  flap  posi¬ 
tions  is  illustrated  in  Fig.  10. 

The  flow  conditions  were:  Mm  *  0.185,  As  *  2.51  x 
10*,  a  =*  8.0*.  Both  laminar  and  turbulent  flow  regions  were 
observed  during  the  experiment  and  the  measured  transition 
locations  on  the  surfaces  were  employed  by  the  algorithm, 
since  a  transition  model  has  not  been  incorporated  into  the 
solver.  The  same  transition  locations  were  employed  for 
both  flap  deflection  angles.  Specifically,  the  flow  along  the 
main  airfoil  element  suction  tide  was  assumed  to  be  tur¬ 


bulent  downstream  of  X  =  G.03e,  while  the  corresponding 
location  on  the  pressure  side  was  taken  at  X  =  0.65c.  The 
flap  pressure  side  flow  is  laminar  and  the  flow  at  the  suction 
side  becomes  turbulent  at  a  distance  of  0.20c  downstream 
of  the  flap  leading  edge. 


Flap  deflection  of  20  degrees 


An  initial  H-grid  of  77x103  points  is  employed.  Two  lev¬ 
els  of  embedding  were  used,  which  resulted  in  50188  cells 
over  the  entire  domain.  Figures  11,  12  show  details  of  the 
embedded  grid.  The  generation  of  a  grid  which  satisfies 
certain  resolution  and  stretching  requirements,  and  which 
follows  the  surfaces,  proves  to  be  quits  difficult.  However, 
the  above  resolution  and  stretching  requirements  can  be  re¬ 
laxed  to  some  extent  when  generating  the  initial  mesh,  since 
the  adaptive  algorithm  will  place  embedded  gride  in  regions 
where  additional  resolution  is  required.  The  minimum  grid 
normal  spacing  at  both  leading  edges  is  10~*  chordlengths. 

The  algorithm  took  5000  iterations  to  reduce  the  residual 
by  three  orders  of  magnitude  and  required  24  hours  on  the 
Alii  ant  FX/8  with  three  processors.  The  free-etream  Mach 
number  is  quits  low,  which  makes  computations  with  an 
explicit  scheme  more  expensive  due  to  the  lower  time-etepe 
that  are  employed. 

Let  us  consider  some  of  the  flow  physics  that  may  be  ob¬ 
served.  Following  the  airfoil  leading  edge  suction  peak,  the 
boundary  layer  experiences  adverse  pressure  and  separates 
at  X  —  0.027  forming  a  small  bubble,  as  it  is  apparent 
in  Fig.  13.  The  beginning  of  the  bubble  that  is  predicted 
agrees  with  the  measured  location.  It  is  important  to  note 
that  appearance  of  the  bubble  is  very  sensitive  to  added  ar¬ 
tificial  viscosity.  An  increase  of  the  fourth  order  smoothing 
coefficient  from  *  0.0004  (which  wae  used  in  this  case) 
to  *  0.0008  causes  the  bubble  to  disappear.  The  extra 
dissipation  that  was  added  caused  the  boundary  layer  to 
remain  attached.  The  same  behaviour  has  been  observed  in 
a  lot  of  cases  during  the  course  of  the  present  work. 

The  overlap  region  between  the  primary  airfoil  and  flap 
element  is  of  special  interest.  The  boundary  layer  on  the 
pressure  tide  of  the  airfoil  is  very  dose  to  separating  ahead 
of  the  trailing  edge  region,  but  recovers  at  the  trailing  edge 
region.  Fig.  14  shows  the  flow  field  in  terms  of  Mach  number 
contours,  and  velocity  vector  plots. 

Finally,  the  flap  leading  edge  region  is  shown  in  Fig.  15. 
Observe  that  the  stagnation  point  lies  on  the  upper  surface 
of  the  flap  despite  the  free-etream  flow  angle  of  attack  being 
6  degrees.  The  flow  proceeds  around  the  leading  edge  from 
the  upper  tide  towards  the  lower  tide  of  the  flap.  Clearly, 
a  different  gap  and  flap  deflection  angle  combination  may 
provide  more  useful  flow  conditions. 

The  numerical  results  may  be  compared  with  correspond¬ 
ing  experimental  results.  Figure  16  shows  good  agreement 
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between  numerical  ud  measured  prqesure  coefficient  values 
over  both  the  main  airfoil  and  the  nap  surfaces,  with  the 
exception  of  the  airfoil  leading  edge  auction  peak.  Higher 
reeolntion  in  the  itrnamwiae  direction  ia  required  in  order 
to  predict  the  magnitude  of  the  auction  peak  accurately. 
IT-velocity  profllee  at  various  location!  are  compared  with 
experimental  values  ia  Fig.  17  and  believed  to  be  the  fint 
comparisons  of  that  kind  that  are  available.  The  locations 
of  both  measurements  and  numerical  profiles  are  given  in 
the  figures.  Locations  on  both  the  airfoil  and  flap  surfaces 
were  chosen,  with  the  profllee  on  the  lower  surfaces  being  in¬ 
verted  in  the  figure.  Since  the  surface  grid  locations  do  not 
exactly  coincide  with  the  measurement  locations,  the  clos¬ 
est  surface  grid  location  was  picked  for  the  comparisons. 
The  comparison  includes  pro  files  in  both  laminar  and  tur¬ 
bulent  regions.  In  the  turbulent  region,  the  laminar*  sub¬ 
layer  is  not  resolved  by  the  two-level  embedded  grid.  A 
third  level  of  adaptation  is  required  that  would  lead  to  a 
prohibitively  expensive  computation  ia  terms  of  computing 
time  and  memory  requirements  for  the  computing  system 
that  was  available.  Overall,  the  numerical  results  are  ia 
reasonable  agreement  with  the  measurements.  It  should  be 
emphasised  that  ia  addition  to  the  slightly  different  sur¬ 
face  locations  that  were  used,  some  small  three-dimensional 
effects  were  reported  in  the  experiment. 


Flap  deflection  of  0  degrees 

We  proceed  now  to  examine  the  previous  two-element 
configuration  but  with  the  flap  now  uadeflected,  i.e.  30 
degrees  upwards  relative  to  the  previous  position.  It  was 
anticipated  that  this  'off-design*  undeflected  position  of  the 


flap  would  'block*  the  flow  to  some  extant  and  meet  proba¬ 
bly  would  cause  the  incoming  airfoil  pressure  side  boundary 
layer  to  separate. 

The  same  initial  mash  of  77x103  points  was  employed  and 
two  levels  of  embedding  resulted  la  53503  cells  over  the  do¬ 
main.  The  case  took  5000  iterations  on  the  final  grid  to 
reduce  the  residual  error  by  two  orders  of  magnitude,  which 
required  about  30  hours  of  computing  time.  Ia  view  of  the 
vortex  present  ia  the  overlap  region  and  the  separated  flow 
(as  will  be  seen  shortly),  there  may  be  a  question  whether 
the  flow  is  steady  or  unsteady.  More  iterations  were  not  al¬ 
lowed  due  to  CPU-time  limitations.  However,  the  limitation 
was  not  of  overriding  interest,  since  the  purpose  of  the  case 
is  to  illustrate  the  capabilities  of  the  algorithm  in  capturing 
a  variety  of  flow  features  that  appear  for  this  'off-design* 
flap  deflection  angle. 

The  flow  ia  the  overlap  region  between  the  two  bod¬ 
ies  proves  to  be  quite  interesting.  The  boundary  laysr  on 
the  pressure  side  of  the  main  airfoil  separates,  as  shown  in 
Fig.  18.  A  large  recirculation  area  is  formed  at  the  airfoil 
trailing  edge  region.  The  center  of  this  area  lies  approx¬ 
imately  ia  the  middle  of  the  overlap  region  between  the 
airfoil  and  the  flap.  The  boundary  layer  reattaches  just  up¬ 
stream  of  the  trailing  edge  region.  The  separated  shear  layer 
impinges  on  the  flap  leading  edge  region  and  is  'divided* 
into  two  distinct  layers.  One  portion  of  this  separated  shear 
layer  fluid  follows  the  flap  upper  surface,  while  the  remain¬ 
ing  portion  passes  around  the  flap  leading  edge  and  follows 
the  lower  surface.  The  adaptive  algorithm  ‘captures*  in  de¬ 
tail  regions  with  different  flow  orientation  and  with  different 
velocity  magnitudes.  The  shear  layer,  which  emanates  from 
the  trailing  edge  lower  side  region,  initially  follows  the  trail- 
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ing  edge  lower  surface  direction,  but  very  quickly  (within 
the  next  three  grid  points  downstream)  the  velocity  vectors 
turn  to  be  parallel  to  the  flap  upper  surface.  An  enlarged 
view  of  the  flow  at  the  flap  leading  edge  region  is  quite  infor¬ 
mative  as  illustrated  in  Fig.  19.  The  part  of  the  separated 
shear  layer  which  arrives  from  the  airfoil  lower  surface,  and 
which  follows  the  flap  lower  surface,  separates  and  forms  a 
small  bubble  as  it  is  evident  from  the  two  velocity  vectors 
plots. 


SUMMARY 


•  An  adaptive  algorithm  has  been  applied  to  flows  of  rel¬ 
atively  high  Reynolds  numbers  in  both  subsonic  and 
transonic  flow.  Complex  geometries  were  also  consid¬ 
ered  and  established  the  feasibility  of  using  quadri¬ 
lateral  meshes  for  multi-element  airfoil  configurations. 
Comparisons  with  experimental  results  permitted  an 
evaluation  of  accuracy  of  the  algorithm. 

•  Important  flow  physics  have  been  ‘captured’  by  the 
adaptive  algorithm  with  considerable  detail.  Flow  phe¬ 
nomena  such  as  leading  edge  stagnation  flows,  shock¬ 
boundary  layer  interactions,  wakes,  and  separated 
boundary  layers  including  small  bubbles,  were  among 
the  observed  phenomena. 
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(b)  shock/boundary  layer  interaction  region  (vertical  scale  enlarged) 

Fignra  8:  Directional  grid*  at  variou*  domain  region*  -  Tr anionic  NACA  0012  (Mom  —  0.754 ,  Re  =  3.70  x  10*  ,  a  =  3.02°) 
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Figure  11:  Two-level  embedded  grid  -  Two-Element  airfoil  {Mm  3  0.185 ,  Rt  —  2.51  xlO*,a  =  6.0s, 6  —  20s) 
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(a)  Mach  number  contours 


(b)  Velocity  vectors  at  initial  grid  nodes 


Figure 

Separation  bubble  at  leading  edge  region  -  Two-Element  airfoil  {Mm  =*  0.18S 
vertical  ecalee  enlarged 


(a)  Mach  number  contours  (vertical  scale  enlarged) 
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(b)  velocity  vectors  at  initial  grid  nodes  (vertical  scale  enlarged) 

Figure  14:  Overlap  region  Bow  field  -  Two- Element  airfoil  “  0.185 ,  R*  —  2.51  x  10* ,  a 
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(a)  Mach  number  contours  (horizontal  scale  enlarged) 


(b)  velocity  vectors  at  initial  grid  nodes  (horizontal  scale  enlarged) 

Figure  IS:  Flap  leading  edge  region  flow  field  -  Two-Element  airfoil  (Ma  =  0.18S ,  Re  =  2.51  x  10*  ,  a  —  6.0°,  S  =  20°) 


-  present  work  ,  (•  ,  ■)  experiment  [18] 

Figure  18:  Preeanre  coeff.  comparison  with  experimental  retolta 

{M<m  =  0.185 ,  Re  =  3.51  x  10* ,  a  ■  8.0*.  S  *  30*) 


Two-Element  airfoil 
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(a)  Mach  number  contours  (vertical  scale  enlarged) 


(b)  velocity  vectors  at  initial  grid  nodes  (vertical  scale  enlarged) 


Figure  18:  Recirculating  Sow  between  airfoil  and  flap  •  Two-Element 

(M»  »  0.185 ,  Rt  -  3.51  x  10*  ,  a  -  6.0\  5  »  0#) 


velocity  vectors  at  initial  grid  nodes  (enlarged  view) 

Figure  19:  Separation  bubble  at  flap  leading  edge  region  -  Two-Element 

(Af„  =  0.185 ,  Re  =  2.51  xlO'.aa  6.0*,  6  =  0*) 
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ABSTRACT 

An  adaptive  algorithm  for  simulation  of  laminar  viscous  flows,  which  consisted  of  grid 
embedding  and  equation  adaptation,  is  extended  to  include  turbulent  fields  as  well  as  time 
accurate  flow  computations.  Specifically,  a  combination  of  grid  embedding  and  redistribu¬ 
tion  is  employed  for  more  effective  grid  adaptation.  An  algorithm  that  allows  spatial  vari¬ 
ation  of  time-steps  alleviates  the  stiffness  for  time  accurate  computations,  which  otherwise 
requires  a  globally  minimum  time-step.  A  method  for  implementing  the  Baldwin-Lomax 
algebraic  turbulence  model  with  unstructured  embedded  meshes  is  developed.  A  numerical 
treatment  of  grid  interfaces  is  studied  and  evaluated.  The  adaptive  algorithm  is  applied 
to  airfoil  flow  fields  at  relatively  high  Reynolds  numbers  of  order  10®,  and  comparisons  are 
made  with  experimental  data. 
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INTRODUCTION 

Considerable  progress  has  been  made  in  the  development  of  numerical  methods  for  the 
solution  of  the  Navier-Stokes  equations.  Most  of  those  methods,  however,  are  impractical 
for  the  calculation  of  complicated  flows  in  a  design  environment.  The  primary  reason  is 
that  the  efficiency  of  the  current  algorithms  is  poor  and  makes  it  difficult  to  obtain  accurate 
results.  Very  fine  resolution  is  needed,  which  results  in  long  computation  times  even  with 
the  use  of  available  supercomputers. 

In  general  the  selection  of  the  equations  which  are  to  be  solved,  of  the  scheme  and 
of  the  grid  are  determined  a  priori  by  the  user  before  starting  the  solution  procedure, 
and  quite  often  some  or  all  of  the  above  factors  must  be  modified  by  the  user  in  order 
to  improve  the  results.  The  robustness  of  current  numerical  schemes  as  well  as  present 
computer  capabilities  have  recently  allowed  a  dramatic  change  in  this  philosophy.  General 
algorithms  have  been  developed  which  are  flexible  enough  to  adaptively  adjust  both  the 
equations,  and  the  grid  during  the  solution  procedure  without  intervention  by  the  user.  An 
initially  coarse  grid  is  embedded  in  regions  with  large  flow-gradients  (e.g  boundary-layers, 
shocks,  wakes  etc).  The  algorithm  senses  high  gradient  regions  and  automatically  divides 
grid-cells  in  such  regions.  This  approach  has  been  used  for  the  resolution  of  shocks  in  flow 
fields  described  by  Euler  equations  [l].  Also,  an  adaptive  algorithm  for  viscous,  laminar 
flows,  which  utilizes  equation  adaptation  as  well,  has  been  developed  in  [2]. 

The  method  of  grid  redistribution  can  be  employed  together  with  adaptive  grid  em¬ 
bedding  in  order  to  yield  a  more  flexible  algorithm  for  viscous  flow  computations.  Also, 
spatial  variation  of  the  time-step  size  may  be  used  for  time-accurate  computations.  All 
cells  within  an  embedded  zone  then  are  integrated  with  the  same  time-step  and  time-steps 
between  zones  vary  by  factors  of  two.  This  procedure  allows  the  use  of  larger  than  globally 
minimum  time-steps  in  portions  of  the  domain.  A  way  of  implementing  the  Baldwin- 
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Lomax  algebraic  turbulence  model  on  embedded  grids  is  proposed  and  investigated.  A 
Numerical  treatment  of  grid  interfaces  also  is  proposed  and  is  investigated.  Finally,  the 
adaptive  algorithm  is  applied  to  flows  of  relatively  high  Reynolds  numbers  in  both  subsonic 
and  transonic  flow,  which  involve  both  single  and  two-element  airfoils.  Comparisons  with 
experiments  permit  an  evaluation  of  the  accuracy  of  the  algorithm. 


NUMERICAL  SCHEME 

The  two-dimensionad  Reynolds- averaged  Navier-Stokes  equations  are  employed.  An  ex¬ 
plicit,  finite  volume  Lax-Wendroff-type  numerical  scheme  which  was  developed  previously 
by  Ni  [3]  for  the  Euler  equations  is  used  for  discretization  of  the  convective  terms. 

The  scheme  has  been  extended  to  include  viscous  terms  as  well  [2].  We  illustrate  the 
discretization  by  consideration  of  the  viscous  term  uIS  at  node  E  of  Fig.  1.  Using  Green’s 
theorem  for  the  volume  abed,  we  have: 

«*«  =  (1/S,,**)  i  (u,)dy 

Jabcd 

=  (1/S<u*)  [(tt,)*Ay*  +  (u,)wAyw  +  (u,)a*Ay,a  +  (u,)*Ay*] 

where  S***  is  the  area  of  the  cell  abed,  Ay*  =  y,  —  ya,  etc.  The  first  order  derivative  at 
the  face  ed  of  the  control  volume  abed,  is  evaluated  employing  the  area  EcFd.  Similar 
volumes  are  used  for  the  other  face  derivatives.  Thus, 

(««)*  =  (1/S*)(ujr  Ayjp  +  u,Ay,  +  u^Ay*  +  ttjAyj] 
where  5*  is  the  area  of  EcFd,  and 

A  yr  =  (Ay/jF  +  Ay^c)/2 

Ay,  =  [vh  +  ye)/2  -  (y/  +  yp)/2 
u«  =  [up  +UH  +  UI  +  Up)/ 4 
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etc.  This  discretization  is  conservative.  The  discretization  of  the  convective  terms  allows 
odd-even  modes  to  appear  in  both  directions  but  the  discretization  of  the  viscous  terms 
eliminates  such  modes,  and  makes  the  scheme  more  robust. 

In  order  to  accelerate  convergence  to  the  steady  state,  a  multiple  grid  method  [3]  which 
acts  only  on  the  convective  terms  was  used.  Its  function  was  to  accelerate  the  propagation 
of  waves  with  the  aid  of  coarser  than  basic  grids.  Odd-even  modes  were  suppressed  in  the 
essentially  inviscid  portion  of  the  flow  with  the  aid  of  a  fourth  order  smoothing  operator, 
while  shocks  were  captured  using  a  second  order  Laplacian  smoothing  operator. 

An  important  element  of  the  above  scheme  is  that  all  operations  can  be  performed  in  a 
piecewise  sense  within  each  cell,  without  the  need  for  any  external  information.  This  is  very 
useful  in  dealing  with  unstructured  grids.  A  more  detailed  description  of  the  integration 
scheme  as  well  as  investigation  of  its  properties  can  be  found  in  [2]. 


COMBINATION  OF  GRID  EMBEDDING  AND  REDISTRIBUTION 
Accuracy  is  achievable  with  minimal  additional  computational  effort  by  embedding 
several  levels  of  finer  grids  only  in  those  regions  of  the  domain  where  important  features 
exist.This  can  be  accomplished  simply  by  subdividing  cells  of  the  initial  coarse  grid  in 
both  cell-directions.  Proceeding  in  that  way  the  embedded  and  initial  grids  would  be 
topologically  similar  and  if  the  initial  grid  is  uniform  and  orthogonal,  these  desirable 
properties  will  characterize  the  embedded  meshes  as  well.  However,  depending  on  cell 
and  feature  orientation,  there  are  situations  in  which  resolution  is  needed  primarily  in  one 
direction  in  the  vicinity  of  the  feature.  In  that  case  it  is  advantageous  to  divide  the  cell  only 
in  that  direction  and  thus  avoid  the  creation  of  unecessary  cells  (directional  refinement) 
[2]. 

The  earlier  approach  to  increasing  local  grid  resolution  has  been  a  redistribution  of 


the  existing  grid  points  such  that  points  are  clustered  in  regions  of  interest  [4j.  Since  the 
number  of  grid  points  is  fixed,  the  clustering  in  one  region  results  in  less  resolution  in  other 
regions.  For  cases  in  which  the  initial  grid  does  not  include  enough  points,  the  procedure 
frequently  results  in  skewed  and  stretched  grids  which  deteriorate  accuracy  of  the  scheme. 

However,  the  use  of  redistribution  can  be  advantageous  when  the  number  of  nodes  is 
sufficient,  and  the  grid  rearrangement  is  not  so  severe  as  to  result  in  a  skewed  and  stretched 
mesh.  Redistribution  has  been  used  here  in  order  to  increase  grid  clustering  close  to  the 
surface  for  airfoil  flows  at  high  (0(10®)]  Reynolds  numbers.  Such  flows  typically  require 
grid  spacing  normal  to  the  wall  to  be  of  order  10'*  chord  lengths  for  airfoil  problems.  As 
a  consequence,  a  large  number  of  embedding  levels  is  required  to  decrease  the  spacing, 
which  often  results  in  excessive  resolution  in  regions  away  from  the  wall. 

Another  important  function  of  redistribution  is  to  provide  a  better  alignment  of  the 
grid  with  flow  features.  A  choice  of  initial  grid  must  be  made  before  the  adaptive  procedure 
starts.  However,  an  initial  mesh  may  be  misaligned  with  the  emerging  flow  features,  since 
the  solution  is  not  known  in  advance.  Also,  it  should  be  noted  that  a  level  of  grid  embedding 
changes  grid  scales  only  by  a  factor  of  two,  and  this  may  not  be  sufficient.  Inclusion  of 
redistribution  as  a  possibility  in  the  adaptive  algorithm  makes  the  procedure  much  more 


effective  in  accomplishing  grid  scale  changes. 

A  redistribution  algorithm  has  been  developed  to  modify  the  initial  grid  adaptively  and 

align  it  with  the  developing  flow  features.  A  measure  of  grid  resolution  in  the  direction 

normal  to  a  surface  can  be  values  of  y+  =  with  ur  =  \i~^~  being  the  wall  friction 

V  V  Pm 

velocity.  A  criterion  based  on  the  y+  values  can  be  employed  either  to  ‘attract’  points 
toward  the  wall,  or  to  ‘repel’  them  away  from  the  surface  so  that  a  specific  y+  value  is 
attained.  The  grid  points  are  moved  along  the  normal  to  the  wall,  and  therefore  the 
resulting  grid  is  orthogonal  to  approximately  the  same  degree  as  the  initial  mesh.  In  other 
words,  this  1-D  grid  motion  does  not  result  in  a  skewed  mesh.  On  the  other  hand,  grid 
embedding  introduces  enough  points  into  the  domain  so  that  excess  wall  clustering  of 
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points  does  not  result  in  insufficient  resolution  elsewhere  in  the  domain. 

Specifically,  the  redistribution  algorithm  is  as  follows: 

•  integrate  on  the  initial  mesh  for  a  number  of  iterations 

•  calculate  the  y+  values  at  the  nodes  which  are  adjacent  to  the  surface.  Then,  evaluate 
both  the  average  and  standard  deviation  for  the  y+  distribution.  The  specified  y+- 
value  to  be  attained  is:  y^/,^  =  y,M  -  a  y^.  .  The  value  of  a  was  chosen  to  be 
0.4  for  this  work. 

•  move  the  nodes  so  that  y+  =  adjacent  to  the  surface.  A  Laplacian  filter 

is  applied  to  this  motion  in  order  to  avoid  skewed  cells  that  would  result  from  very 
different  motions  of  neighbouring  nodes.  Finally,  rescale  all  distances  by  which  nodes 
will  be  moved  so  that  the  maximum  distance  is  less  than  a  certain  value.  This  may 
be  useful  when  a  limited  number  of  nodes  move  a  very  large  distance  compared  to 
the  rest  of  the  nodes  (e.g  stagnation  points,  separation  points). 

It  must  be  emphasized  that  the  sole  purpose  of  the  redistribution  procedure  is  to 
yield  a  grid  which  is  in  better  alignment  with  the  developing  flow  features.  Choice  of  the 
coefficient  a  and  of  other  parameters  (essentially  the  value  of  ytptnfud)  i®  n°t  crucial  to 
the  procedure.  The  following  example  of  a  NACA  0012  airfoil  in  transonic  flow  (Af<»  = 
0.754 ,  Re  —  3.76  x  10® ,  a  =  3.02°)  illustrates  the  validity  of  the  procedure.  Figure  2(a) 
shows  the  flow  field  after  1000  iterations  on  an  initial  mesh  of  65x41  nodes.  A  shock  appears 
on  the  suction  side  of  the  airfoil  and  induces  an  adverse  pressure  gradient,  which  causes 
the  boundary  layer  to  thicken  considerably.  The  initial  grid  is  clustered  at  the  surface  and 
it  is  apparent  that  such  wall-clustering  is  unnecessary  where  the  boundary  layer  is  thick. 
On  the  other  hand,  the  leading  edge  region  requires  more  resolution.  Fig.  2(b), (c)  show 
the  initial  and  redistributed  grid  close  to  the  airfoil.  Comparing  the  two,  it  is  observed 
that  points  have  been  ‘attracted’  at  the  leading  edge  region  and  at  the  pressure  side,  while 
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nodes  have  been  ‘repelled’  on  that  part  of  the  suction  side  over  which  the  boundary  layer 
is  thick. 


TEMPORAL  ADAPTATION 

For  explicit  methods,  the  time  step  size  is  frequently  restricted  by  numerical  stability 
considerations  such  as  the  CFL  condition.  The  time-step  is  proportional  to  cell  size,  and 
the  cell  dimensions  for  adjacent  embedded  zones  usually  differ  by  a  factor  of  two.  In  such 
cases,  the  time  steps  allocated  to  each  cell  may  be  grouped  into  temporal  zones.  All  cells 
corresponding  to  a  given  temporal  zone  then  are  integrated  with  the  same  time  step,  and 
the  time  steps  between  temporal  zones  differ  by  factors  of  two  as  illustrated  in  Figure  3. 
The  maximum  time-step  in  the  domain  is  2"  times  larger  than  the  minimum  step,  where 
n  is  the  number  of  temporal  zones  [5,2],  Thus  the  cells  of  temporal  level  1  are  integrated 
twice  using  a  time  step  St  before  those  of  temporal  level  0  that  are  integrated  once  using 
a  time  step  of  2 St. 

This  procedure  allows  a  spatial  variation  of  time  steps  while  simultaneously  maintaining 
time  accuracy.  In  the  present  approach  the  temporal  zones  coincide  with  the  embedded 
zones,  which  simplifies  the  algorithm  considerably,  compared  to  a  method  in  which  the 
temporal  zones  are  independent  from  the  embedding  zones  [6].  Figure  3  illustrates  the 
concept  with  the  temporal  zones  coinciding  with  the  spatial  ones.  This  splitting  of  the 
time-steps  according  to  the  embedded  zones  saves  CPU  time  since  not  all  of  the  cells  are 
marched  in  time  with  the  minimum  time-step  that  is  found  for  the  entire  domain.  Instead, 
only  those  cells  that  are  in  the  embedding  zone  containing  the  cell  with  the  minimum  time- 
step  are  integrated  with  that  minimum  time-step;  the  remaining  cells  which  lie  on  other 
zones,  are  marched  at  time-steps  that  are  multiples  of  the  globally  smallest  time-step. 

As  a  test  of  the  time  accurate  procedure,  the  model  case  of  a  channel  flow  with  a  forced 


oscillation  of  the  inlet  Mach  number,  was  considered  (Figure  4).  The  inlet  flow  was  varied 
sinusoidally  according  to  Af<»  =  0-8  +  0.04  sin  f,  and  a  low  Reynolds  number  (10s)  was 
chosen  in  order  to  reduce  the  CPU-time  required  by  the  explicit  integration  basic  scheme. 
Only  one  level  of  embedding  was  used  as  shown  in  the  figure.  Significant  temporal  gradients 
exist  over  the  entire  domain,  since  the  solution  basically  follows  the  inlet  oscillation  at  all 
grid  nodes.  Both  curves  in  Fig.  4  represent  time  histories  of  the  U- velocity  component 
at  a  specific  node  of  the  domain.  After  approximately  three  periods,  the  entire  flow  field 
oscillates  with  the  same  period  as  the  inlet.  The  agreement  between  the  embedded  grid 
and  the  corresponding  globally  fine  grid  is  very  good.  Similar  agreement  was  observed  for 
different  locations  within  the  domain,  lying  in  both  viscous  and  inviscid  portions  of  the 
domain.  Comparing  the  computing  times  between  the  adapted  case  and  the  case  in  which 
the  globally  minimum  time-step  is  used,  temporal  adaptation  yields  a  speed-up  factor  of 
20%. 


AN  ALGEBRAIC  TURBULENCE  MODEL  WITH  EMBEDDED  GRIDS 
The  algebraic  turbulence  model  due  to  Baldwin  and  Lomax  [7]  is  a  two  layer  mixing 
length-type  model.  The  quantities  required  to  evaluate  the  eddy  viscosity  can  be  divided 
into  two  groups:  those  that  characterize  each  cell  (local  quantities)  and  others  that  char¬ 
acterize  an  entire  boundary  layer  profile  (global  quantities).  Vorticity  u>  and  the  distance 
y  from  the  center  of  the  cell  to  the  wall,  are  both  local  cell-based  quantities.  The  fric¬ 
tion  velocity  u,,  as  well  as  the  model  parameters  yma*tFmo*^DiF  characterize  an  entire 
profile  at  a  certain  streamwise  location.  The  latter  quantities  cannot  be  obtained  in  the 
normal  way,  since  the  presence  of  unstructured  embedded  grids  requires  information  to  be 
restricted  to  that  within  each  cell.  The  turbulence  model  is  usually  implemented  along 
lines  perpendicular  to  the  surface.  However,  with  an  embedded  mesh  continuous  normal 
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mesh  lines  do  not  necessarily  exist  because  of  interrupting  interfaces. 

The  present  approach  applies  the  model  in  a  ‘cell-wise’  manner,  based  on  parameters 

which  are  known  at  the  center  of  each  cell.  This  is  consistent  with  the  overall  approach 

when  dealing  with  unstructured  meshes;  i.e.  not  using  information  from  outside  of  a  cell. 

The  vorticity  is  calculated  using  Green’s  theorem  over  the  cell:  w  =  —  — /  (u dx  +  vdy) 

SetU 

where  Setu  is  the  cell  area.  The  distance  of  each  cell  from  the  wall  is  calculated  and  stored 
whenever  the  grid  is  updated.  In  order  to  evaluate  the  variables  which  characterize  the 
entire  shear  layer  profile  at  each  streamwise  location,  the  ceils  are  arranged  in  streamwise 
stations  as  illustrated  in  Fig.  5.  The  stations  consist  of  cells  from  the  initial  mesh  plus  those 
cells  that  are  introduced  by  the  first  and  second  embedding  levels.  During  the  higher  than 
second  level  refinements,  no  additional  stations  are  created  and  the  new  cells  are  assigned 
to  the  previously  created  stations. 

The  station  quantities  Vman ,  F^a  are  calculated  by  scanning  through  all  cells  that 
belong  to  each  station.  The  model  function  F  is  formed  for  each  cell,  and  its  maximum 
value  Fmaa  which  occurs  at  the  station  cell  with  distance  from  the  wall  equal  to  is 
found.  The  wall  quantity  ur  is  evaluated  as  an  average  from  the  station  cells  which  are 
adjacent  to  the  wall  [2].  Finally,  the  eddy  viscosity  values  which  are  evaluated  at  the  cell 
centers  are  interpolated  to  the  nodes  using  a  weighted  interpolation.  At  each  node,  (e.g.  0 
in  Fig.  6)  the  eddy  viscosity  value  is  evaluated  employing  the  corresponding  values  at  the 
centers  of  the  surrounding  cells,  according  to  the  formula: 

Mt°  =  ( — +  {  )**«  +  (  )/*«»  +  (  (1) 

where  A\ ,  Aj ,  A* ,  A»  are  surrounding  cell  areas,  and  A  is  their  sum.  The  above  interpo¬ 
lation  formula  reduces  to  linear  interpolation  in  1-D. 

The  treatment  for  wakes  is  similar  with  only  a  few  modifications.  The  wake  stations 
are  arranged  in  pairs  formed  from  upper  and  lower  parts.  The  minimum  velocity  cell  is 
found  by  scanning  through  the  cells  of  both  stations  of  each  pair.  Then  cells  ‘migrate’ 
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from  one  station  to  its  counterpart  so  that  those  which  axe  above  the  minimum  velocity 
cell  are  assigned  to  the  upper  station,  and  the  remaining  are  assigned  to  the  lower  station 
of  the  pair. 

The  above  described  implementation  of  the  algebraic  turbulence  model  on  embedded 
grids  has  been  evaluated  in  terms  of  accuracy,  computing  time  and  memory  requirements. 
The  cell  quantities  were  calculated  with  the  locally  finest  cell  accuracy  achievable  by  the 
solver.  The  station  quantities  such  as  ym«*»  Fma*,  however,  were  calculated  over  cells 
of  a  station  which  may  include  cells  with  different  streamwise  locations.  The  essential 
assumption  here  is  that  no  significant  streamwise  variation  of  these  quantities  occurs  over 
a  station.  Another  approximation  is  the  interpolation  of  eddy  viscosity  values  from  cell- 
centers  to  nodes,  which  is  less  accurate  for  a  stretched  mesh. 

A  verification  of  the  approach  has  been  carried  out  for  the  case  of  subsonic  flow 
(Moo  —  0.5,  Re  =  10s)  for  the  10%  circular  arc  cascade.  Transition  from  laminar  to 
turbulent  flow  was  fixed  at  the  middle  of  the  bump.  Two  levels  of  embedding  were  used  as 
illustrated  in  Fig.  7(a),  and  comparisons  of  skin-friction  distributions  with  [8]  sire  shown  in 
Fig.  7(b).  The  sudden  increase  in  C/  at  X  =  0.5  corresponds  to  the  transition  point  from 
laminar  to  turbulent  flow.  Additional  accuracy  evaluation  cases,  for  which  experimental 
measurements  exist,  are  presented  in  a  later  section. 

Application  of  the  model  does  not  require  additional  CPU  time  compared  to  the  time 
which  is  consumed  when  applying  the  model  on  a  regular  structured  mesh.  The  pointers 
which  are  employed  are  calculated  only  once  following  each  embedding.  Overall,  the  model 
is  applied  every  five  iterations  and  consumes  approximately  2%  of  the  total  CPU  time 
required  by  the  solver. 

Two  additional  pieces  of  information  are  necessary  when  applying  the  model  on  embed¬ 
ded  meshes.  These  are  the  pointer  which  arranges  the  cells  into  stations,  which  is  a  2-D 
array  with  number  of  elements  equal  to  the  total  number  of  cells  in  the  domain,  and  the 
normal  distance  of  each  cell-center  from  either  the  wall  or  the  wake  center.  The  additional 
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memory  that  is  required  is  about  3%. 


NUMERICAL  TREATMENTS  OF  GRID  INTERFACES 

Embedding  of  cells  introduces  internal  boundaries  (interfaces)  between  those  cells  with 
either  different  refinement  levels  or  with  different  types  of  subdivisions.  Grid  interfaces  in 
turn  can  be  categorized  into  two  main  groups.  The  first  are  interfaces  which  are  charac¬ 
terized  by  an  abrupt  change  in  cell  size  only.  The  grid  is  continuous  across  the  interface, 
but  cell  metrics  change  as  shown  in  Fig.  8(a)  ( metric  discontinuous  grid).  The  second  type 
includes  grid  lines  which  actually  are  interrupted  by  interfaces  as  illustrated  in  Fig.  8(b) 
(discontinuous  grid).  In  this  latter  case  cells  on  the  coarser  side  of  the  interface  may 
contain  additional  nodes  at  the  face  midpoint. 

Numerical  treatments  for  interfaces  have  been  examined  in  [9]  for  the  potential  equa¬ 
tion,  and  in  [10,11,12]  for  the  Euler  equations.  There  are  a  number  of  problems  which  are 
imposed  on  the  integration  scheme  due  to  the  presence  of  interfaces.  The  sudden  change  in 
grid  size  introduces  a  significant  stretching  error,  which  may  result  in  a  reduction  of  order 
of  accuracy.  Existing  schemes  have  been  developed  for  cells  with  nodes  at  only  the  four 
cells  corners,  and  they  require  some  modification  in  order  to  take  into  account  extra  face 
nodes.  Another  important  issue  is  maintaining  conservation  across  interfaces.  The  fluxes 
across  the  boundaries  surrounding  an  interface  cell  should  cancel  one  another  in  order  for 
the  scheme  to  be  conservative.  Other  important  issues  are  coding  complexity  and  the  ease 
with  which  an  interface  treatment  scheme  can  be  extended  to  three  dimensions. 

It  is  clear  that  these  considerations  impose  serious  limitations  on  construction  of  an 
interface  scheme,  and  that  the  above  requirements  may  easily  contradict  one  another.  In 
fact,  simultaneous  achievement  of  both  conservation  and  accuracy  is  very  difficult,  and 
even  impossible  in  most  cases.  However,  not  all  of  the  above  requirements  are  important 
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for  a  specific  interface.  For  regions  in  which  solution  variations  are  relatively  small,  clearly 
a  reduction  in  order  of  accuracy  has  a  negligible  effect  on  numerical  results.  Conservation 
proves  to  be  an  important  property  mostly  in  cases  of  moving  shocks  for  accurate  prediction 
of  their  location  and  speed.  However,  it  proves  to  be  unimportant  within  a  shear  layer. 
Conversely,  accuracy  is  more  of  an  issue  in  a  boundary  layer,  since  the  second  order 
derivatives  (viscous  terms)  are  important  and  are  more  ‘sensitive’  to  grid  stretching  error 
than  are  first  order  derivatives  (inviscid  terms)  [2]. 

Consider  the  two  types  of  discontinuous  and  metric  discontinuous  grids  of  Fig.  9  and  the 
interface  node  a.  For  both  types,  cells  Ai,At,A$,A4  would  have  normally  been  employed 
in  order  to  evaluate  the  inviscid,  viscous,  and  smoothing  contributions  to  node  a.  Cells 
As,  are  embedded  fine  cells,  while  cells  A\,As  are  unembedded  coarse  cells  with  vertical 
dimensions  twice  those  of  cells  i4s,A«.  It  is  clear  that  an  evaluation  for  example,  of  the 
viscous  derivative  un  at  node  a 


(<**»)•  = 


t*«i  -  2«*  +  U„3 


(y«i  -  yaj)J 

suffers  from  a  severe  stretching  error.  In  order  to  alleviate  the  error,  use  is  made  of  the 
‘parent’  cells  Aj',  Aj1,  with  roughly  the  same  size  as  the  coarse  cells  Ai,A*.  The  expression 


for  un  now  becomes 


u«i  -  2tta  +  v  i 
(«*w).  = - 4L 


(y.i  -  yal»)a 

(Fig.  9).  Accuracy  is  retained  at  the  interface  despite  the  abrupt  change  in  cell-size.  This 
has  proven  to  be  a  very  important  property  for  cases  in  which  the  interface  lies  within  a 
viscous  region. 

In  the  case  of  a  discontinuous  interface,  the  treatment  ignores  the  interface  face  node 
c,  and  the  state-variable  values  at  the  node  are  obtained  through  interpolation  from  cor¬ 
responding  values  at  interface  nodes  b,a.  The  above  also  ‘ignores’  the  type  of  division 
(directional  or  two-directional)  of  the  interface  cells,  and  both  discontinuous  and  met¬ 
ric  discontinuous  types  of  interfaces  are  treated  identically,  which  simplifies  the  interface 
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algorithm  significantly.  This  is  very  important,  if  extension  of  the  algorithm  to  three 
dimensions  is  of  interest. 

However,  the  above  does  suffer  from  a  nonconservation  error.  Use  of  ‘parent’  cells 
Ai,  A-i  instead  of  fine  cells  -4s,  A«,  as  well  as  the  averaging  that  is  used  for  the  middle 
interface  node  c,  introduces  a  nonconservation  error.  The  integration  would  be  conservative 
if  the  sum 

(2) 

,  where  6U  is  the  change  in  time  of  the  state  variables,  5  is  the  cell-area,  and  At  is  the  time- 
step,  consisted  only  of  contributions  from  the  boundary  nodes  [2].  The  nonconservation 
error  is  maximum  in  the  case  of  a  shock  which  is  located  at  and  parallel  to  an  interface. 
When  the  shock  is  located  at  coarse  cell  Ai ,  for  example,  the  nonconservation  error  is  of 
order 

6f'P  6?lU.  (3) 

In  the  above  expression  6V  denotes  variation  along  one  of  the  cell-directions.  However,  if  the 
shock  is  located  just  one  cell  away  from  the  interface,  the  error  vanishes.  In  practice,  many 
schemes  capture  a  shock  within  approximately  four  cells,  which  implies  that  interfaces 
should  be  at  least  four  cells  away  from  shocks  which  are  parallel  to  them.  Shock  detection 
can  be  based  on  monitoring  6aU,  S^U  which  is  maximum  at  the  shock  and  therefore,  the 
center  of  the  embedded  region  will  be  at  the  shock. 


APPLICATIONS  TO  AIRFOIL  FLOWS 

The  adaptive  algorithm  has  been  applied  to  airfoil  flow  fields  for  relatively  high  Reynolds 
numbers.  Two  kinds  of  geometries  were  considered.  The  first  involves  a  single  element 
NACA  0012  airfoil;  the  second  is  a  two-element  NLR  airfoil,  consisting  of  a  main  airfoil 
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with  a  flap.  Specifically,  the  NACA  0012  airfoil  is  considered  for  both  subsonic  and  tran¬ 
sonic  flow,  while  the  flow  past  the  two-element  airfoil  configuration  is  low  subsonic.  The 
numerical  results  are  compared  with  corresponding  experimental  measurements  [13,14]. 


NACA  0012 

The  subsonic  flow  conditions  were:  =  0.50,  Re  =  2.91  x  10®,  a  =  1.77°.  An  initial 

C-mesh  of  33x17  points  was  employed,  with  two  levels  of  embedding  resulting  in  a  final 
number  of  5225  cells  within  the  domain.  The  minimum  grid  normal  spacing  at  the  airfoil 
leading  edge  was  9x  10-5  chord  lengths,  while  that  for  the  trailing  edge  region  was  9x  10"4. 
The  spacing  in  the  streamwise  direction  at  the  leading  edge  region  was  0.002,  while  the 
corresponding  spacing  at  the  trailing  edge  was  0.026.  The  resulting  flow  field  is  depicted 
in  Fig.  10  in  terms  of  Mach  number  contour  plots. 

The  experiment  [13]  provided  pressure  distribution  data,  which  are  compared  with  the 
numerical  results  in  Fig.  11;  the  comparison  shows  very  good  agreement  between  numerics 
and  measurements.  The  computed  Ci  of  0.192  compares  very  well  with  the  experimental 
value  of  0.195. 

The  transonic  flow  conditions  were:  Af*,  =  0.754,  Re  =  3.76  x  10®,  a  =  3.02°.  An  initial 
C-mesh  of  65x41  points  was  applied  with  the  farfield  boundary  placed  at  15  chord  lengths 
away  from  the  airfoil.  Three  levels  of  embedding  are  introduced  by  the  algorithm  (with 
the  third  level  being  directional)  and  results  in  the  final  grid  illustrated  in  Fig.  12.  The 
final  number  of  cells  within  the  domain  is  40440.  The  minimum  grid  normal  spacing  at 
the  airfoil  leading  edge  is  2  x  10~s  chord  lengths,  while  the  spacing  at  the  trailing  edge 
region  is  2  x  10-4.  The  spacing  in  the  streamwise  direction  at  the  leading  edge  region  is 
3  x  10~4,  while  the  corresponding  spacing  at  the  trailing  edge  is  0.004. 

The  case  took  5000  iterations  to  converge,  and  the  computing  time  consumed  was  8.5 
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hours.  Figure  13  illustrates  the  flow  field  in  terms  of  Mach  number  contours.  A  shock 
forms  on  the  suction  side  at  40%  of  the  chord,  with  the  Mach  number  just  upstream  of  the 
shock  being  1.31.  The  boundary  layer  on  the  suction  side  of  the  airfoil  starts  to  thicken 
upstream  of  the  shock  and  separates  at  X  =  0.82  close  to  the  trailing  edge.  On  the  other 
hand,  the  pressure  side  boundary  layer  is  considerably  thinner  and  remains  attached  to 
the  surface.  The  oscillations  that  are  observed  just  upstream  of  the  shock  are  indicative  of 
odd-even  modes.  They  exist  due  to  the  low  values  of  artificial  viscosity  that  were  used  to 
ensure  that  the  solution  within  the  viscous  region  does  not  become  contaminated.  These 
oscillations  apparently  do  not  induce  inaccuracies  into  the  solution  since  the  shock  location 
is  predicted  accurately. 

The  accuracy  of  the  procedure  also  may  be  examined  by  comparing  the  experimental 
pressure  coefficient  wall  distribution  with  the  corresponding  numerical  result  (Fig.  14). 
The  shock  location  is  predicted  accurately  although  it  is  a  little  more  smeared.  A  fourth 
level  of  embedding  (that  would  provide  a  more  ‘crisp’  shock),  was  not  allowed  due  to 
computer  limitations.  The  agreement  remains  good  downstream  of  the  shock.  However, 
as  the  trailing  edge  is  approached,  the  boundary  layer  does  not  resist  the  adverse  pressure 
gradient  and  separates,  causing  the  pressure  distribution  to  tend  to  level  out.  Such  trailing 
edge  separation  is  not  observed  in  the  experiment.  The  algebraic  turbulence  model  that 
is  employed  is  believed  to  be  largely  :esponsible  for  this  behaviour  as  has  been  concluded 
by  comparative  studies  of  different  turbulence  models  for  transonic  airfoils  [15,16].  The 
pressures  on  the  pressure  and  suction  sides  match  at  the  trailing  edge  and  the  somewhat 
lower  pressure  level  at  the  suction  side  influences  the  pressure  side  distribution  causing  it 
to  deviate  slightly  from  the  experimental  results.  The  deviation  is  approximately  the  same 
over  moat  of  the  pressure  surface.  Unfortunately,  skin-friction  distribution  measurements 
were  not  performed. 
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Two-Element  Airfoil 

To  date,  the  existing  numerical  results  for  multi-element  airfoils  include  panel  methods 
and  Euler  computations  [17,18,19,20,21],  and  a  viscous-inviscid  interaction  scheme  [22]. 
However,  the  full  Navier-Stokes  equations  sometimes  are  required  in  order  to  describe 
important  flow  physics.  The  present  adaptive  algorithm  appears  to  be  promising  for  such 
computations.  Although  the  use  of  quadrilateral  meshes  in  the  past  has  yielded  quite 
awkward  grids,  the  introduction  of  adaptation  promises  to  be  of  some  interest  for  better 
mesh  topologies.  Finally,  the  use  of  quadrilateral  meshes,  in  contrast,  for  example,  to  a 
triangular  mesh,  provides  a  test  of  their  suitability  for  complex  geometries  computations. 

The  basic  airfoil  section  is  a  NLR  7301  airfoil.  The  flap  chord  is  0.32c,  where  c  is  the 
main  airfoil  chord  (Fig.  15).  The  overlap  region  between  the  main  airfoil  and  the  flap  is 
0.053c,  and  a  gap  width  of  0.026c  was  considered.  During  the  experiment  only  a  single 
flap  deflection  angle  6  of  20  degrees  down  had  been  considered  and  this  was  duplicated  for 
numerical  simulation.  The  two-element  configuration  is  illustrated  in  Fig.  15. 

The  flow  conditions  were:  M®,  =  0.185,  Re  =  2.51  x  10®,  a  =  6.0*.  Both  laminar  and 
turbulent  flow  regions  were  observed  during  the  experiment,  and  therefore  the  measured 
surface  transition  locations  were  employed  by  the  algorithm  since  a  transition  model  has 
net  been  incorporated  into  the  solver. 

An  initial  H-grid  of  77x103  points  was  employed.  Two  levels  of  embedding  were  em¬ 
ployed,  which  resulted  in  50185  cells  over  the  entire  domain.  The  minimum  grid  normal 
spacing  at  both  leading  edges  is  10~4  chordlengths.  The  algorithm  took  5000  iterations 
to  reduce  the  residual  by  three  orders  of  magnitude  and  required  24  hours  on  an  Alliant 
FX/8  with  three  processors.  The  free-stream  Mach  number  is  quite  low,  which  makes 
computations  with  an  explicit  scheme  more  expensive  due  to  the  lower  time-steps  that  are 
employed. 
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The  numerical  results  may  be  compared  with  corresponding  experimental  results.  Fig¬ 
ure  16  shows  good  agreement  between  numerical  and  measured  pressure  coefficient  values 
over  both  the  main  airfoil  and  the  flap  surfaces,  with  the  exception  of  the  airfoil  leading 
edge  suction  peak.  Higher  resolution  in  the  streamwise  direction  is  required  in  order  to  pre¬ 
dict  the  magnitude  of  the  suction  peak  accurately.  U-velocity  profiles  at  various  locations 
are  compared  with  experimental  values  in  Fig.  17  and  believed  to  be  the  first  comparisons 
of  that  kind  that  are  available.  The  locations  of  both  measurements  and  numerical  pro¬ 
files  are  given  in  the  figures.  Locations  on  both  the  airfoil  and  flap  surfaces  were  chosen, 
with  the  profiles  on  the  lower  surfaces  being  inverted  in  the  figure.  Since  the  surface  grid 
locations  do  not  exactly  coincide  with  the  measurement  locations,  the  closest  surface  grid 
location  was  picked  for  the  comparisons.  The  comparison  includes  profiles  in  both  laminar 
and  turbulent  regions.  In  the  turbulent  region,  the  ‘laminar’  sublayer  is  not  resolved  by 
the  two-level  embedded  grid.  A  third  level  of  adaptation  is  required  that  would  lead  to 
a  prohibitively  expensive  computation  in  terms  of  computing  time  and  memory  require¬ 
ments  for  the  computing  system  that  was  available.  Overall,  the  numerical  results  are  in 
reasonable  agreement  with  the  measurements.  It  should  be  emphasized  that  in  addition  to 
the  slightly  different  surface  locations  that  were  used,  some  small  three-dimensional  effects 
were  reported  in  the  experiment. 


SUMMARY 

•  Combination  of  both  grid  embedding  and  grid  redistribution  has  been  applied  for 
viscous  flow  adaptive  computations. 

•  A  procedure  which  allows  a  spatial  variation  of  the  time  steps  while  simultaneously 
maintaining  time  accuracy  was  developed. 
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•  An  algebraic  turbulence  model  has  been  applied  to  embedded  grids  in  a  relatively 
simple  and  accurate  manner. 

•  An  interface  treatment  that  avoids  interface  grid  stretching  error  and  that  is  non- 
conservative  is  proposed  for  viscous  flows  that  do  not  include  moving  shocks. 

•  The  overall  adaptive  algorithm  has  been  applied  to  flow  fields  which  are  relatively 
complex,  as  well  as  to  flows  that  involve  complicated  geometries.  Comparisons  be¬ 
tween  numerical  and  experimental  results  permit  evaluation  of  accuracy  of  the  algo¬ 
rithm. 
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FIGURES  LABELS 


Figure  1  :  Integration  cell 
Figure  2(a)  :  flow  field 

Figure  2(b)  :  portion  of  mesh  before  redistribution 

Figure  2(c)  :  portion  of  mesh  after  redistribution 

Figure  2  :  Initial  grid  redistribution  -  Transonic  NACA  0012  (M^,  = 
0.754 ,  Re  =  3.76  x  10®  ,  a  =  3.02°)  -  vertical  scales  enlarged 

Figure  3  :  Spatially  varying  time-steps 

Figure  4(a)  :  embedded  mesh 

Figure  4(b)  :  velocity  history  at  X  =  0.5  ,  Y  =  0.6 

Figure  4  :  Time  accuracy  for  oscillating  inlet  flow 

Figure  5  :  Cells  grouped  by  stations 

Figure  6  :  Interpolation  of  m  values  from  cell-centers  to  node  0 
Figure  7(a)  :  two-level  embedded  mesh 

Figure  7(b) :  comparison  of  wall-shear  distributions:  -  2-level  embedded 
mesh  ,  •  [8] 

Figure  7  :  Turbulent  flow  model  case  - 10%  circular  arc  cascade  (M»  = 
0.5 ,  Re  =  10®) 
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Figure  8(a)  :  metric  discontinuous 


Figure  8(b)  :  discontinuous 

Figure  8  :  Types  of  grid  interfaces 

Figure  9  :  Interface  treatment  without  stretching  error 

Figure  10  :  Flow  field  around  a  subsonic  NACA  0012  (M»  =  0.5 ,  Re  = 
2.91  x  106  ,  a  =  1.77°)  -  horizontal  scale  enlarged 

Figure  11  :  Comparison  of  pressure  coeff.  distributions  -  Subsonic 
NACA  0012  (A/*  =  0.5 ,  Re  =  2.91  x  10* ,  a  =  1.77°) 

-  present  work  ,  •  experiment  (13] 

Figure  12  :  Final  3-level  embedded  grid  -  Transonic  NACA  0012  (Mx>  = 
0.754 ,  Re  =  3.76  x  10® ,  a  =  3.02°) 

Figure  13 :  Flow  field  around  transonic  NACA  0012  (Moo  =  0.754 ,  Re  = 
3.76  x  10® ,  a  =  3.02°) 

Figure  14  :  Comparison  of  Pressure  coeff.  distribution  with  experiment 

-  Transonic  NACA  0012  (M*  =  0.754 ,  Re  =  3.76  x  10s ,  a  =  3.02°) 

-  embedded  mesh  ,  •  pressure  side  [13]  ,  ▲  suction  side  [13] 

Figure  15  :  Two-Element  airfoil  geometry 

Figure  16 :  Pressure  coeff.  comparison  with  experimental  results  -  Two- 
Element  airfoil  (Moo  =  0.185 ,  Re  =  2.51  x  10* ,  o  =  6.0°, 5  =  20°) 

-  present  work  ,  (•,  ■)  experiment  [14] 
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Figure  17  :  Comparison  of  U-velocity  profiles  with  experimental  results 
Two-Element  airfoil  ( M «,  =  0.185 ,  Re  =  2.51  x  10® ,  a  =  6.0°,  S  =  20°) 

■  present  work  ,  •  experiment  [14] 
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